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FIELD OF THE INVENTION 

The present invention relates to novel regulatory elements and vectors for the 
expression of one or more proteins in a host cell. 

BACKGROUND OF THE INVENTION 

Methods for expression of recombinant proteins in bacterial host are widespread and 
offer ease of use and purification of the recombinant product. However, use of these systems 
for the expression of eukaryotic proteins is often limited by problems of insolubility and lack 
of proper post-transcription and post-translational processing {see, e.g., U.S. Pat. No. 
5,721,121, incorporated herein by reference). Thus, eukaryotic expression systems are 
generally used for the expression of eukaryotic proteins. In particular, the pharmaceutical 
biotechnology industry relies heavily on the production of recombinant proteins in mammalian 
cells. These recombinantly produced proteins are essential to the therapeutic treatment of 
many diseases and conditions. In many cases, the market for these proteins exceeds a billion 
dollars a year. Examples of proteins produced recombinantly in mammalian cells include 
erythropoietin, factor VIII, factor IX, and insulin. In addition, recombinant antibodies are 
often used as therapeutic agents. Clinical applications of recombinantly produced proteins, in 
particular antibodies, often require large amounts of highly purified proteins. Proteins are 
generally produced in either mammalian cell culture or in transgenic animals. 

Vectors for transferring the gene of interest into mammalian cells are widely available, 
including plasmids, retroviral vectors, and adenoviral vectors. Retroviral vectors are widely 
used as vehicles for delivery of genes into mammalian cells {See e.g.. Vile and Russell, 
British Medical Bulletin, 51:12 [1995]). However, current methods for creating mammalian 
cell lines for expression of recombinant proteins suffer from several drawbacks. {See, e.g., 
Mielke et al, Biochem. 35:2239-52 [1996]). Episomal systems allow for high expression 
levels of the recombinant protein, but are frequently only stable for a short time period {See, 
e.g., Klehr and Bode, Mol. Genet. (Life Sci. Adv.) 7:47-52 [1988]). Mammalian cell lines 
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containing integrated exogenous genes are somewhat more stable, but there is increasing 
evidence that stabiUty depends on the presence of only a few copies or even a single copy of 
the exogenous gene. Vectors are often unstable, resulting in a decrease in the level of 
protein expression over time. 

Based on overall product yield, expression of recombinant proteins in animals results 
in higher yields, relative to expression in cell culture (See e.g., Werner et al, 
Arzneimittelforshcung, 48:870 [1998]; Pollock et al., J. Immunol. Methods, 231:147 [1999]). 
However, expression in transgenic animals is limited by methods of producing transgenic 
mammals, variation in production and purity, and the life span of the animal. 

Thus, despite continued efforts in the field, vectors for high level, continuous 
expression of one or more proteins in a host cell remain needed in the art. 

SUMMARY OF THE INVENTION 

The present invention relates to novel regulatory elements and vectors for the 
expression of one or more proteins in a host cell. 

In some embodiments, the present invention provides a hybrid a-lactalbumin promoter 
comprising at least one portion derived from a first mammalian a-lactalbumin promoter and 
at least one portion derived from a second mammalian a-lactalbumin promoter. The present 
invention is not limited to portions derived from any particular a-lactalbumin promoter. 
Indeed, portions from a variety of a-lactalbumin promoters are contemplated, including, but 
not limited to bovine, human, ovine, caprine, and murine a-lactalbumin promoters. In other 
embodiments, the present invention provides a nucleic acid comprising a nucleic acid 
sequence selected from the group consisting of SEQ ID N0:1 and sequences hybridizable to 
SEQ ID N0:1 under low stringency conditions, wherein the nucleic acid contains sequences 
derived from at least two mammalian sources and causes mammary specific gene expression. 
In still other embodiments, the present invention provides a nucleic acid sequence encoding a 
hybrid bovine/human alpha lactalbumin (aLA) promoter/enhancer (z.e., SEQ ID N0:1) and 
sequences that are hybridizable to a hybrid bovine/human a-LA promoter under low to high 
stringency conditions. In preferred embodiments, these sequences drive the expression of an 
exogenous gene in the mammary gland of a transgenic animal. In some embodiments, the 



hybridizable sequence comprises human and bovine elements. In other embodiments, the 
present invention provides a vector containing the nucleic acid sequence of hybrid 
bovine/human a-LA promoter. In some embodiments, the vector is a retroviral vector. In 
still further embodiments, the present invention provides a host cell containing a vector 
containing a hybrid bovine/human a-LA promoter. 

The present invention also provides a nucleic acid encoding a mutant RNA export 
element (PPE element; SEQ ID N0:2) and sequences that are hybridizable to a mutant PPE 
element. In some embodiments, the sequences hybridizable to a mutant PPE element contain 
ATG sequences that have been mutated at at least one of the positions corresponding to 
nucleic acid residues 4, 112, 131, and 238 of the wild-type PPE element. In preferred 
embodiments, these sequences enhance the export from the nucleus of the RNA to which they 
are operably linked. In other embodiments, the present invention provides a vector containing 
the nucleic acid sequence of the mutant PPE element. In some embodiments, the vector is a 
retroviral vector. In still further embodiments, the present invention provides a host cell 
containing a vector that contains a mutant PPE element. 

The present invention also provides a nucleic acid encoding an IRES coding sequence 
and a signal peptide coding sequence, wherein said IRES and signal peptide coding sequences 
are adjacent to one another. In some embodiments, the IRES/signal peptide sequence 
comprises SEQ ID NO: 3 or SEQ ID NO: 12 and sequences that are hybridizable to these 
sequences under low stringency conditions. In preferred embodiments, these sequences 
interact with a ribosome and provide for the secretion of proteins to which they are operably 
linked. The present invention is not limited to any particular signal sequence peptide. 
Indeed, it is contemplated that a variety of signal peptides find use in the present invention. 
In some embodiments, the signal peptide sequence is selected from alpha-casein, human 
growth hormone, or a-lactalbumin signal peptide sequences. In other embodiments, the 
present invention provides a vector containing the nucleic acid sequence of the IRES/signal 
peptide sequence. In some embodiments, the vector is a retroviral vector. In still further 
embodiments, the present invention provides a host cell containing a vector that contains a 
IRES/signal peptide sequence. 

The present invention also provides methods for producing a protein of interest. In 



some embodiments, the methods comprise providing a host cell and a vector containing at 
least one exogenous gene operably linked to a bovine/human hybrid a-lactalbumin promoter 
and introducing the vector to the host cell under conditions such that expression of the protein 
encoded by the exogenous gene is expressed. In some embodiments, the vector further 
contains a mutant RNA export element. In other embodiments, the vector contains at least 
two exogenous genes. In still further embodiments, the two or more exogenous genes are 
arranged in a polycistronic sequence separated by an internal ribosome entry site/bovine a- 
lactalbumin signal peptide. 

The present invention also provides methods for expressing at least two proteins in a 
polycistronic sequence. In some embodiments, the proteins are unrelated, while in other 
embodiments, the proteins are subunits of a multisubunit protein. In some preferred 
embodiments, the present invention provides methods for producing an immunoglobulin 
including providing a host cell and a vector comprising a first exogenous gene and a second 
exogenous gene, wherein the first exogenous gene encodes a first immunoglobulin chain and 
wherein the second exogenous gene encodes a second inmiunoglobulin chain, and wherein the 
first and the second genes are separated by an internal ribosome entry site, and introducing 
the vector to the host cell under conditions such the first immunoglobulin chain and the 
second immunoglobulin chain encoded by the first and second exogenous genes are expressed. 
In some embodiments, the first immunoglobulin chain is an immunoglobulin light chain {e.g., 
X or k) and the second immunoglobulin chain is an immunoglobulin heavy chain (e.g., y, a, 
|i, 5, or s). In other embodiments, the first immunoglobulin chain is an immunoglobulin 
heavy chain (e.g., y, a, ji, 5, or s) and the second immunoglobulin chain is an 
immunoglobulin Ught chain (e.g., ^ or k). In some embodiments, the vector is a retroviral 
vector. In other embodiments, the vector further contains a bovine of-lactalbumin signal 
peptide. In still further embodiments, the vector further contains a bovine/human hybrid a- 
lactalbumin promoter. In yet other embodiments, the first immunoglobulin chain and the 
second immunoglobulin chain are expressed at a ratio of about 0.9:1.1 to 1:1. The present 
invention also provides immunoglobulins produced by the methods described herein. The 
present invention is not limited to the use of any particular vector. Indeed, it is contemplated 
that a variety of vectors find use in the present invention, including, but not limited to 



plasmid and retroviral vectors. In some preferred embodiments, the retroviral vector is 
pseudotyped. 

In still further embodiments, the present invention provides methods of indirectly 
detecting the expression of a protein of interest comprising providing a host cell transduced or 
transfected with a vector encoding a polycistronic sequence, wherein the polycistronic 
sequence comprises a signal protein and a protein of interest operably linked by an IRES, and 
culturing the host cells under conditions such that the signal protein and protein of interest are 
produced, wherein the presence of the signal protein indicates the presence of the protein of 
interest. The methods of the present invention are not limited to the expression of any 
particular protein of interest. Indeed, the expression of a variety of proteins of interest is 
contemplated, including, but not limited to, G-protein coupled receptors. The present 
invention is not limited to the use of any particular signal protein. Indeed, the use of variety 
of signal proteins is contemplated, including, but not limited to, immunoglobulin heavy and 
light chains, beta-galactosidase, beta-lactamase, green fluorescent protein, and luciferase. In 
particularly preferred embodiments, expression of the signal protein and protein of interest is 
driven by the same promoter and the signal protein and protein of interest are transcribed as a 
single transcriptional unit. 

DESCRIPTION OF THE FIGURES 

Figure 1 is a Westem blot of a 15% SDS-PAGE gel run under denaturing conditions 
and probed with anti-human IgG (Fc) and anti-human IgG (kappa). 
Figure 2 is a graph of MN14 expression over time. 

Figure 3 is a Westem blot of a 15% PAGE run under non-denaturing conditions and 
probed with anti-human IgG (Fc) and anti-human IgG (Kappa). 

Figure 4 provides the sequence for the hybrid human-bovine alpha-lactalbumin 
promoter (SEQ ID N0:1). 

Figure 5 provides the sequence for the mutated PPE sequence (SEQ ID N0:2). 

Figure 6 provides the sequence for the IRES-Signal peptide sequence (SEQ ID N0:3). 

Figures 7a and 7b provide the sequence for CMV MN14 vector (SEQ ID N0:4). 

Figures 8a and 8b provide the sequence for the CMV LL2 vector (SEQ ID NO: 5). 



Figures 9a-c provide the sequence for the MMTV MN14 vector (SEQ ID N0:6). 
Figures lOa-d provide the sequence for the alpha-lactalbumin MN14 Vector (SEQ ID 

N0:7). 

Figures lla-c provide the sequence for the alpha-lactalbumin Bot vector (SEQ ID 

N0:8). 

Figures 12a-b provide the sequence for the LSRNL vector (SEQ ID N0:9). 
Figures 13a-b provide the sequence for the alpha-lactalbumin cc49IL2 vector (SEQ ID 
NO: 10). 

Figures 14a-c provides the sequence for the alpha-lactalbumin YP vector (SEQ ID 
N0:11). 

Figure 15 provides the sequence for the IRES-Casein signal peptide sequence (SEQ ID 
NO: 12). 

Figures 16a-c provide the sequence for the LNBOTDC vector (SEQ ID NO: 13). 
Figures 1 7a-d provide the sequence of a retroviral vector that expresses a G-Protein 
coupled receptor and antibody light chain, 

DEFINITIONS 

To facilitate understanding of the invention, a number of terms are defined below. 

As used herein, the term "host cell" refers to any eukaryotic cell (e.g., mammaUan 
cells, avian cells, amphibian cells, plant cells, fish cells, and insect cells), whether located in 
vitro or in vivo. 

As used herein, the term "cell culture" refers to any in vitro culture of cells. Included 
within this term are continuous cell lines (e.g., with an immortal phenotype), primary cell 
cultures, finite cell lines (e.g., non-transformed cells), and any other cell population 
maintained in vitro, including oocytes and embryos. 

As used herein, the term "vector" refers to any genetic element, such as a plasmid, 
phage, transposon, cosmid, chromosome, virus, virion, etc., which is capable of repUcation 
when associated with the proper control elements and which can transfer gene sequences 
between cells. Thus, the term includes cloning and expression vehicles, as well as viral 
vectors. 



As used herein, the term "integrating vector" refers to a vector whose integration or 
insertion into a nucleic acid {e.g., a chromosome) is accomplished via an integrase. Examples 
of "integrating vectors" include, but are not limited to, retroviral vectors, transposons, and 
adeno associated virus vectors. 

As used herein, the term "integrated" refers to a vector that is stably inserted into the 
genome (Le., into a chromosome) of a host cell. 

As used herein, the term "multiplicity of infection" or "MOI" refers to the ratio of 
integrating vectors:host cells used during transfection or transduction of host cells. For 
example, if 1,000,000 vectors are used to transduce 100,000 host cells, the multipUcity of 
infection is 10. The use of this term is not limited to events involving transduction, but 
instead encompasses introduction of a vector into a host by methods such as lipofection, 
microinjection, calcium phosphate precipitation, and electroporation. 

As used herein, the term "genome" refers to the genetic material (e.g., chromosomes) 
of an organism. 

The term "nucleotide sequence of interest" refers to any nucleotide sequence (e.g., 
RNA or DNA), the manipulation of which may be deemed desirable for any reason (e.g., treat 
disease, confer improved qualities, expression of a protein of interest in a host cell, etc.), by 
one of ordinary skill in the art. Such nucleotide sequences include, but are not limited to, 
coding sequences of structural genes (e.g., reporter genes, selection marker genes, oncogenes, 
drug resistance genes, growth factors, etc.), and non-coding regulatory sequences which do not 
encode an mRNA or protein product (e.g., promoter sequence, polyadenylation sequence, 
termination sequence, enhancer sequence, etc.). 

As used herein, the term "protein of interest" refers to a protein encoded by a nucleic 
acid of interest. 

As used herein, the term "signal protein" refers to a protein that is co-expressed with a 
protein of interest and which, when detected by a suitable assay, provides indirect evidence of 
expression of the protein of interest. Examples of signal protein useful in the present 
invention include, but are not limited to, immunoglobulin heavy and light chains, beta- 
galactosidase, beta-lactamase, green fluorescent protein, and luciferase. 

As used herein, the term "exogenous gene" refers to a gene that is not naturally 



present in a host organism or cell, or is artificially introduced into a host organism or cell. 

The term "gene" refers to a nucleic acid (e,g,, DNA or RNA) sequence that comprises 
coding sequences necessary for the production of a polypeptide or precursor (e.g., proinsulin). 
The polypeptide can be encoded by a full length coding sequence or by any portion of the 
coding sequence so long as the desired activity or functional properties (e.g., enzymatic 
activity, ligand binding, signal transduction, etc.) of the full-length or fragment are retained. 
The term also encompasses the coding region of a structural gene and includes sequences 
located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb 
or more on either end such that the gene corresponds to the length of the full-length mRNA. 
The sequences that are located 5' of the coding region and which are present on the mRNA 
are referred to as 5' untranslated sequences. The sequences that are located 3' or downstream 
of the coding region and which are present on the mRNA are referred to as 3' untranslated 
sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A 
genomic form or clone of a gene contains the coding region interrupted with non-coding 
sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are 
segments of a gene which are transcribed into nuclear RNA (hnRNA); introns may contain 
regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear 
or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. 
The mRNA functions during translation to specify the sequence or order of amino acids in a 
nascent polypeptide. 

As used herein, the term "gene expression" refers to the process of converting genetic 
information encoded in a gene into RNA (e.g., mRNA, rRNA, tRNA, or snRNA) through 
"transcription" of the gene (i.e., via the enzymatic action of an RNA polymerase), and for 
protein encoding genes, into protein through "translation" of mRNA. Gene expression can be 
regulated at many stages in the process. "Up-regulation" or "activation" refers to regulation 
that increases the production of gene expression products (i.e., RNA or protein), while "down- 
regulation" or "repression" refers to regulation that decrease production. Molecules (e.g., 
transcription factors) that are involved in up-regulation or down-regulation are often called 
"activators" and "repressors," respectively. 

Where "amino acid sequence" is recited herein to refer to an amino acid sequence of a 



naturally occurring protein molecule, "amino acid sequence" and like terms, such as 
"polypeptide" or "protein" are not meant to limit the amino acid sequence to the complete, 
native amino acid sequence associated with the recited protein molecule. 

As used herein, the terms "nucleic acid molecule encoding," "DNA sequence 
encoding," "DNA encoding," "RNA sequence encoding," and "RNA encoding" refer to the 
order or sequence of deoxyribonucleotides or ribonucleotides along a strand of 
deoxyribonucleic acid or ribonucleic acid. The order of these deoxyribonucleotides or 
ribonucleotides determines the order of amino acids along the polypeptide (protein) chain. 
The DNA or RNA sequence thus codes for the amino acid sequence. 

As used herein, the term "variant," when used in reference to a protein, refers to 
proteins encoded by partially homologous nucleic acids so that the amino acid sequence of the 
proteins varies. As used herein, the term "variant" encompasses proteins encoded by 
homologous genes having both conservative and nonconservative amino acid substitutions that 
do not result in a change in protein function, as well as proteins encoded by homologous 
genes having amino acid substitutions that cause decreased (e.g., null mutations) protein 
function or increased protein function. 

As used herein, the terms "complementary" or "complementarity" are used in reference 
to polynucleotides (Le., a sequence of nucleotides) related by the base-pairing rules. For 
example, for the sequence "A-G-T," is complementary to the sequence "T-C-A." 
Complementarity may be "partial," in which only some of the nucleic acids' bases are 
matched according to the base pairing rules. Or, there may be "complete" or "total" 
complementarity between the nucleic acids. The degree of complementarity between nucleic 
acid strands has significant effects on the efficiency and strength of hybridization between 
nucleic acid strands. This is of particular importance in amplification reactions, as well as 
detection methods that depend upon binding between nucleic acids. 

The terms "homology" and "percent identity" when used in relation to nucleic acids 
refers to a degree of complementarity. There may be partial homology (i.e., partial identity) 
or complete homology {i.e., complete identity). A partially complementary sequence is one 
that at least partially inhibits a completely complementary sequence from hybridizing to a 
target nucleic acid sequence and is referred to using the functional term "substantially 



homologous." The inhibition of hybridization of the completely complementary sequence to 
the target sequence may be examined using a hybridization assay (Southern or Northern blot, 
solution hybridization and the like) under conditions of low stringency. A substantially 
homologous sequence or probe {i.e., an oligonucleotide which is capable of hybridizing to 
another oligonucleotide of interest) will compete for and inhibit the binding (i.e., the 
hybridization) of a completely homologous sequence to a target sequence under conditions of 
low stringency. This is not to say that conditions of low stringency are such that non-specific 
binding is permitted; low stringency conditions require that the binding of two sequences to 
one another be a specific (i.e., selective) interaction. The absence of non-specific binding 
may be tested by the use of a second target which lacks even a partial degree of 
complementarity (e.g., less than about 30% identity); in the absence of non-specific binding 
the probe will not hybridize to the second non-complementary target. 

The art knows well that numerous equivalent conditions may be employed to comprise 
low stringency conditions; factors such as the length and nature (DNA, RNA, base 
composition) of the probe and nature of the target (DNA, RNA, base composition, present in 
solution or immobilized, etc.) and the concentration of the salts and other components (e.g., 
the presence or absence of formamide, dextran sulfate, polyethylene glycol) are considered 
and the hybridization solution may be varied to generate conditions of low stringency 
hybridization different from, but equivalent to, the above listed conditions. In addition, the 
art knows conditions that promote hybridization under conditions of high stringency (e.g., 
increasing the temperature of the hybridization and/or wash steps, the use of formamide in the 
hybridization solution, etc.). 

When used in reference to a double-stranded nucleic acid sequence such as a cDNA or 
genomic clone, the term "substantially homologous" refers to any probe that can hybridize to 
either or both strands of the double-stranded nucleic acid sequence under conditions of low 
stringency as described above. 

When used in reference to a single-stranded nucleic acid sequence, the term 
"substantially homologous" refers to any probe that can hybridize (i.e., it is the complement 
of) the single-stranded nucleic acid sequence under conditions of low stringency as described 
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above. 

As used herein, the term "hybridization" is used in reference to the pairing of 
complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the 
strength of the association between the nucleic acids) is impacted by such factors as the 
degree of complementary between the nucleic acids, stringency of the conditions involved, the 

of the formed hybrid, and the G:C ratio within the nucleic acids. A single molecule that 
contains pairing of complementary nucleic acids within its structure is said to be "self- 
hybridized." 

As used herein, the term "TJ' is used in reference to the "melting temperature" of a 
nucleic acid. The melting temperature is the temperature at which a population of double- 
stranded nucleic acid molecules becomes half dissociated into single strands. The equation for 
calculating the of nucleic acids is well known in the art. As indicated by standard 
references, a simple estimate of the value may be calculated by the equation: Tj„ = 81.5 + 
0.41 (% G + C), when a nucleic acid is in aqueous solution at 1 M NaCl (See e.g., Anderson 
and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization [1985]). Other 
references include more sophisticated computations that take structural as well as sequence 
characteristics into account for the calculation of T^^. 

As used herein the term "stringency" is used in reference to the conditions of 
temperature, ionic strength, and the presence of other compounds such as organic solvents, 
under which nucleic acid hybridizations are conducted. With "high stringency" conditions, 
nucleic acid base pairing will occur only between nucleic acid fragments that have a high 
frequency of complementary base sequences. Thus, conditions of "weak" or "low" stringency 
are often required with nucleic acids that are derived from organisms that are genetically 
diverse, as the frequency of complementary sequences is usually less. 

"High stringency conditions" when used in reference to nucleic acid hybridization 
comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 
5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04-H20 and 1.85 g/1 EDTA, pH adjusted to 7.4 with 
NaOH), 0.5% SDS, 5X Denhardt's reagent and 100 jug/ml denatured salmon sperm DNA 
followed by washing in a solution comprising O.IX SSPE, 1.0% SDS at 42°C when a probe 
of about 500 nucleotides in length is employed. 
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"Medium stringency conditions" when used in reference to nucleic acid hybridization 
comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 
5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04*H20 and 1.85 g/1 EDTA, pH adjusted to 7.4 with 
NaOH), 0.5% SDS, 5X Denhardt's reagent and 100 )ig/ml denatured sahnon sperm DNA 
followed by washing in a solution comprising l.OX SSPE, 1.0% SDS at 42°C when a probe 
of about 500 nucleotides in length is employed. 

"Low stringency conditions" comprise conditions equivalent to binding or hybridization 
at 42°C in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04«H20 and 1.85 
g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5X Denhardt's reagent [SOX 
Denhardt's contains per 500 ml: 5 g FicoU (Type 400, Pharamcia), 5 g BSA (Fraction V; 
Sigma)] and 100 /^g/ml denatured salmon sperm DNA followed by washing in a solution 
comprising 5X SSPE, 0.1% SDS at 42°C when a probe of about 500 nucleotides in length is 
employed. 

A gene may produce multiple RNA species that are generated by differential splicing 
of the primary RNA transcript. cDNAs that are splice variants of the same gene will contain 
regions of sequence identity or complete homology (representing the presence of the same 
exon or portion of the same exon on both cDNAs) and regions of complete non-identity (for 
example, representing the presence of exon "A" on cDNA 1 wherein cDNA 2 contains exon 
"B" instead). Because the two cDNAs contain regions of sequence identity they will both 
hybridize to a probe derived from the entire gene or portions of the gene containing sequences 
found on both cDNAs; the two splice variants are therefore substantially homologous to such 
a probe and to each other. 

The terms "in operable combination," "in operable order," and "operably linked" as 
used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid 
molecule capable of directing the transcription of a given gene and/or the synthesis of a 
desired protein molecule is produced. The term also refers to the linkage of amino acid 
sequences in such a manner so that a functional protein is produced. 

As used herein, the term "selectable marker" refers to a gene that encodes an 
enzymatic activity that confers the ability to grow in medium lacking what would otherwise 
be an essential nutrient (e.g, the HIS3 gene in yeast cells); in addition, a selectable marker 
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may confer resistance to an antibiotic or drug upon the cell in which the selectable marker is 
expressed. Selectable markers may be "dominant"; a dominant selectable marker encodes an 
enzymatic activity that can be detected in any eukaryotic cell line. Examples of dominant 
selectable markers include the bacterial aminoglycoside 3' phosphotransferase gene (also 
referred to as the neo gene) that confers resistance to the drug G418 in mammalian cells, the 
bacterial hygromycin G phosphotransferase (hyg) gene that confers resistance to the antibiotic 
hygromycin and the bacterial xanthine-guanine phosphoribosyl transferase gene (also referred 
to as the gpt gene) that confers the ability to grow in the presence of mycophenolic acid. 
Other selectable markers are not dominant in that their use must be in conjunction with a cell 
line that lacks the relevant enzyme activity. Examples of non-dominant selectable markers 
include the thymidine kinase (tk) gene that is used in conjunction with tk' cell lines, the CAD 
gene which is used in conjunction with CAD-deficient cells and the mammalian 
hypoxanthine-guanine phosphoribosyl transferase (hprt) gene which is used in conjunction 
with hprt " cell lines. A review of the use of selectable markers in mammaUan cell lines is 
provided in Sambrook, J, et aL, Molecular Cloning: A Laboratory Manual, 2nd ed., Cold 
Spring Harbor Laboratory Press, New York (1989) pp.16.9-16.15. 

As used herein, the term "regulatory element" refers to a genetic element which 
controls some aspect of the expression of nucleic acid sequences. For example, a promoter is 
a regulatory element that facilitates the initiation of transcription of an operably linked coding 
region. Other regulatory elements are splicing signals, polyadenylation signals, termination 
signals, RNA export elements, intemal ribosome entry sites, etc. (defined infra). 

Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer" 
elements. Promoters and enhancers consist of short arrays of DNA sequences that interact 
specifically with cellular proteins involved in transcription (Maniatis et al.. Science 236:1237 
[1987]). Promoter and enhancer elements have been isolated from a variety of eukaryotic 
sources including genes in yeast, insect and mammalian cells, and viruses (analogous control 
elements, z.e., promoters, are also found in prokaryotes). The selection of a particular 
promoter and enhancer depends on what cell type is to be used to express the protein of 
interest. Some eukaryotic promoters and enhancers have a broad host range while others are 
functional in a limited subset of cell types (for review see, Voss et ai, Trends Biochem. Sci., 
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11:287 [1986]; and Maniatis et al, supra). For example, the SV40 early gene enhancer is 
very active in a wide variety of cell types from many mammalian species and has been widely 
used for the expression of proteins in mammalian cells (Dijkema et al, EMBO J. 4:761 
[1985]). Two other examples of promoter/enhancer elements active in a broad range of 
mammalian cell types are those from the human elongation factor la gene (Uetsuki et al, J. 
Biol. Chem., 264:5791 [1989]; Kim et al. Gene 91:217 [1990]; and Mizushima and Nagata, 
Nuc. Acids. Res., 18:5322 [1990]) and the long terminal repeats of the Rous sarcoma virus 
(Gorman et al, Proc. Natl. Acad. Sci. USA 79:6777 [1982]) and the human cytomegalovirus 
(Boshart et aL, Cell 41:521 [1985]). 

As used herein, the terai "promoter/enhancer" denotes a segment of DNA which 
contains sequences capable of providing both promoter and enhancer functions (z.e., the 
functions provided by a promoter element and an enhancer element, see above for a 
discussion of these functions). For example, the long terminal repeats of retroviruses contain 
both promoter and enhancer functions. The enhancer/promoter may be "endogenous" or 
"exogenous" or "heterologous." An "endogenous" enhancer/promoter is one which is naturally 
linked with a given gene in the genome. An "exogenous" or "heterologous" 
enhancer/promoter is one which is placed in juxtaposition to a gene by means of genetic 
manipulation (i.e., molecular biological techniques such as cloning and recombination) such 
that transcription of that gene is directed by the linked enhancer/promoter. 

Regulatory elements may be tissue specific or cell specific. The term "tissue specific" 
as it applies to a regulatory element refers to a regulatory element that is capable of directing 
selective expression of a nucleotide sequence of interest to a specific type of tissue (e.g., 
liver) in the relative absence of expression of the same nucleotide sequence of interest in a 
different type of tissue (e.g., lung). 

Tissue specificity of a regulatory element may be evaluated by, for example, operably 
linking a reporter gene to a promoter sequence (which is not tissue-specific) and to the 
regulatory element to generate a reporter construct, introducing the reporter construct into the 
genome of an animal such that the reporter construct is integrated into every tissue of the 
resulting transgenic animal, and detecting the expression of the reporter gene (e.g., detecting 
mRNA, protein, or the activity of a protein encoded by the reporter gene) in different tissues 
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of the transgenic animal. The detection of a greater level of expression of the reporter gene 
in one or more tissues relative to the level of expression of the reporter gene in other tissues 
shows that the regulatory element is "specific" for the tissues in which greater levels of 
expression are detected. Thus, the term "tissue-specific" (e.g., liver-specific) as used herein is 
a relative term that does not require absolute specificity of expression. In other words, the 
term "tissue-specific" does not require that one tissue have extremely high levels of expression 
and another tissue have no expression. It is sufficient that expression is greater in one tissue 
than another. By contrast, "strict" or "absolute" tissue-specific expression is meant to indicate 
expression in a single tissue type (e.g., liver) with no detectable expression in other tissues. 

The term "cell type specific" as applied to a regulatory element refers to a regulatory 
element which is capable of directing selective expression of a nucleotide sequence of interest 
in a specific type of cell in the relative absence of expression of the same nucleotide sequence 
of interest in a different type of cell within the same tissue. The term "cell type specific" 
when applied to a regulatory element also means a regulatory element capable of promoting 
selective expression of a nucleotide sequence of interest in a region within a single tissue. 

Cell type specificity of a regulatory element may be assessed using methods well 
known in the art (e.g., immunohistochemical staining and/or Northern blot analysis). Briefly, 
for immunohistochemical staining, tissue sections are embedded in paraffin, and paraffin 
sections are reacted with a primary antibody specific for the polypeptide product encoded by 
the nucleotide sequence of interest whose expression is regulated by the regulatory element. 
A labeled (e.g., peroxidase conjugated) secondary antibody specific for the primary antibody 
is allowed to bind to the sectioned tissue and specific binding detected (e.g., with 
avidin/biotin) by microscopy. Briefly, for Northern blot analysis, RNA is isolated fi^om cells 
and electrophoresed on agarose gels to fi-actionate the RNA according to size followed by 
transfer of the RNA firom the gel to a solid support (e.g., nitrocellulose or a nylon 
membrane). The immobilized RNA is then probed with a labeled oligo-deoxyribonucleotide 
probe or DNA probe to detect RNA species complementary to the probe used. Northern blots 
are a standard tool of molecular biologists. 

The term "promoter," "promoter element," or "promoter sequence" as used herein, 
refers to a DNA sequence which when ligated to a nucleotide sequence of interest is capable 
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of controlling the transcription of the nucleotide sequence of interest into mRNA. A promoter 
is typically, though not necessarily, located 5' (z.e., upstream) of a nucleotide sequence of 
interest whose transcription into mRNA it controls, and provides a site for specific binding by 
RNA polymerase and other transcription factors for initiation of transcription. 

Promoters may be constitutive or regulatable. The term '^constitutive" when made in 
reference to a promoter means that the promoter is capable of directing transcription of an 
operably linked nucleic acid sequence in the absence of a stimulus (e.g. , heat shock, 
chemicals, etc.). In contrast, a "regulatable" promoter is one which is capable of directing a 
level of transcription of an operably linked nucleic acid sequence in the presence of a stimulus 
(e.g., heat shock, chemicals, etc) which is different from the level of transcription of the 
operably linked nucleic acid sequence in the absence of the stimulus. 

The presence of "splicing signals" on an expression vector often results in higher levels 
of expression of the recombinant transcript. Splicing signals mediate the removal of introns 
from the primary RNA transcript and consist of a splice donor and acceptor site (Sambrook et 
al. Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory 
Press, New York [1989], pp. 16.7-16.8). A commonly used splice donor and acceptor site is 
the splice junction from the 16S RNA of SV40. 

Efficient expression of recombinant DNA sequences in eukaryotic cells requires 
expression of signals directing the efficient termination and polyadenylation of the resulting 
transcript. Transcription termination signals are generally found downstream of the 
polyadenylation signal and are a few hundred nucleotides in length. The term "poly A site" 
or "poly A sequence" as used herein denotes a DNA sequence that directs both the 
termination and polyadenylation of the nascent RNA transcript. Efficient polyadenylation of 
the recombinant transcript is desirable as transcripts lacking a poly A tail are unstable and are 
rapidly degraded. The poly A signal utilized in an expression vector may be "heterologous" 
or "endogenous." An endogenous poly A signal is one that is found naturally at the 3' end of 
the coding region of a given gene in the genome. A heterologous poly A signal is one that is 
isolated from one gene and placed 3' of another gene. A commonly used heterologous poly 
A signal is the SV40 poly A signal. The SV40 poly A signal is contained on a 237 bp 
BamHl/Bcll restriction fragment and directs both termination and polyadenylation (Sambrook, 
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supra, at 16.6-16.7). 

Eukaryotic expression vectors may also contain "viral replicons "or "viral origins of 
replication." Viral replicons are viral DNA sequences that allow for the extrachromosomal 
replication of a vector in a host cell expressing the appropriate replication factors. Vectors 
that contain either the SV40 or polyoma virus origin of replication replicate to high "copy 
number" (up to 10"* copies/cell) in cells that express the appropriate viral T antigen. Vectors 
that contain the replicons from bovine papillomavirus or Epstein-Barr virus replicate 
extrachromosomally at "low copy number" (-100 copies/cell). However, it is not intended that 
expression vectors be limited to any particular viral origin of replication. 

As used herein, the term "long terminal repeat" of "LTR" refers to transcriptional 
control elements located in or isolated from the U3 region 5' and 3' of a retroviral genome. 
As is known in the art, long terminal repeats may be used as control elements in retroviral 
vectors, or isolated from the retroviral genome and used to control expression from other 
types of vectors. 

As used herein, the term "secretion signal" refers to any DNA sequence which 
when operably linked to a recombinant DNA sequence encodes a signal peptide which is 
capable of causing the secretion of the recombinant polypeptide. In general, the signal 
peptides comprise a series of about 15 to 30 hydrophobic amino acid residues {See, e.g., 
Zwizinski et al, J. Biol. Chem. 255(16): 7973-77 [1980], Gray et al. Gene 39(2): 247-54 
[1985], and Martial et al. Science 205: 602-607 [1979]). Such secretion signal sequences are 
preferably derived from genes encoding polypeptides secreted from the cell type targeted for 
tissue-specific expression {e.g., secreted milk proteins for expression in and secretion from 
mammary secretory cells). Secretory DNA sequences, however, are not limited to such 
sequences. Secretory DNA sequences from proteins secreted from many cell types and 
organisms may also be used {e.g., the secretion signals for t-PA, serum albumin, lactoferrin, 
and growth hormone, and secretion signals from microbial genes encoding secreted 
polypeptides such as from yeast, filamentous ftingi, and bacteria). 

As used herein, the terms "RNA export element" or "Pre-mRNA Processing Enhancer 
(PPE)" refer to 3' and 5' cis- acting post- transcriptional regulatory elements that enhance 
export of RNA from the nucleus. "PPE" elements include, but are not limited to Mertz 
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sequences (described in U.S. Pat. Nos. 5,914,267 and 5,686,120, all of which are incorporated 
herein by reference) and woodchuck mRNA processing enhancer (WPRE; WO99/14310 and 
U.S. Pat. No. 6,136,597, each of which is incorporated herein by reference). 

As used herein, the term "polycistronic" refers to an mRNA encoding more than 
5 polypeptide chain {See, e.g., WO 93/03143, WO 88/05486, and European Pat. No. 117058, all 
of which are incorporated herein by reference). Likewise, the term "arranged in polycistronic 
sequence" refers to the arrangement of genes encoding two different polypeptide chains in a 
single mRNA. 

As used herein, the term "internal ribosome entry site" or "IRES" refers to a sequence 
10 located between polycistronic genes that permits the production of the expression 
D product originating from the second gene by internal initiation of the translation of the 

dicistronic mRNA. Examples of internal ribosome entry sites include, but are not limited to, 
^ those derived from foot and mouth disease virus (FDV), encephalomyocarditis virus, 
□ poliovirus and RDV (Scheper et al, Biochem. 76: 801-809 [1994]; Meyer et al, J. Virol. 69: 
ifi 2819-2824 [1995]; Jang et al, 1988, J. Virol. 62: 2636-2643 [1998]; Haller et al, J. Virol. 
66: 5075-5086 [1995]). Vectors incorporating IRES's may be assembled as is known in the 

U 

yi art. For example, a retroviral vector containing a polycistronic sequence may contain the 

111 

[^f^^ following elements in operable association: nucleotide polylinker, gene of interest, an internal 
H ribosome entry site and a mammalian selectable marker or another gene of interest. The 

J—™ 

20 polycistronic cassette is situated within the retroviral vector between the 5' LTR and the 3' 
LTR at a position such that transcription from the 5' LTR promoter transcribes the 
polycistronic message cassette. The transcription of the polycistronic message cassette may 
also be driven by an internal promoter (e.g., cytomegalovirus promoter) or an inducible 
promoter, which may be preferable depending on the use. The polycistronic message cassette 

25 can further comprise a cDNA or genomic DNA (gDNA) sequence operatively associated 

within the polylinker. Any mammalian selectable marker can be utilized as the polycistronic 
message cassette mammalian selectable marker. Such mammalian selectable markers are well 
known to those of skill in the art and can include, but are not limited to, kanamycin/G418, 
hygromycin B or mycophenolic acid resistance markers. 

30 As used herein, the term "retrovirus" refers to a retroviral particle which is capable of 
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entering a cell {i.e., the particle contains a membrane-associated protein such as an envelope 
protein or a viral G glycoprotein which can bind to the host cell surface and facilitate entry of 
the viral particle into the cytoplasm of the host cell) and integrating the retroviral genome (as 
a double-stranded provirus) into the genome of the host cell. The term "retrovirus" 
encompasses Oncovirinae (e.g., Moloney murine leukemia virus (MoMOLV), Moloney murine 
sarcoma virus (MoMSV), and Mouse mammary tumor virus (MMTV), Spumavirinae, amd 
Lentivirinae (e.g., Human immunodeficiency virus, Simian immunodeficiency virus, Equine 
infection anemia virus, and Caprine arthritis-encephalitis virus; See, e.g., U.S. Pat. Nos. 
5,994,136 and 6,013,516, both of which are incorporated herein by reference). 

As used herein, the term "retroviral vector" refers to a retrovirus that has been 
modified to express a gene of interest. Retroviral vectors can be used to transfer genes 
efficiently into host cells by exploiting the viral infectious process. Foreign or heterologous 
genes cloned (/.e., inserted using molecular biological techniques) into the retroviral genome 
can be delivered efficiently to host cells which are susceptible to infection by the retrovirus. 
Through well known genetic manipulations, the replicative capacity of the retroviral genome 
can be destroyed. The resulting replication-defective vectors can be used to introduce new 
genetic material to a cell but they are unable to replicate. A helper virus or packaging cell 
line can be used to permit vector particle assembly and egress from the cell. Such retroviral 
vectors comprise a repHcation-deficient retroviral genome containing a nucleic acid sequence 
encoding at least one gene of interest (z.e., a polycistronic nucleic acid sequence can encode 
more than one gene of interest), a 5' retroviral long terminal repeat (5' LTR); and a 3' 
retroviral long terminal repeat (3' LTR). 

The term "pseudotyped retroviral vector" refers to a retroviral vector containing a 
heterologous membrane protein. The term "membrane-associated protein" refers to a protein 
(e.g., a viral envelope glycoprotein or the G proteins of viruses in the Rhabdoviridae family 
such as VSV, Piry, Chandipura and Mokola) which are associated with the membrane 
surrounding a viral particle; these membrane-associated proteins mediate the entry of the viral 
particle into the host cell. The membrane associated protein may bind to specific cell surface 
protein receptors, as is the case for retroviral envelope proteins or the membrane-associated 
protein may interact with a phospholipid component of the plasma membrane of the host cell, 
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as is the case for the G proteins derived from members of the Rhabdoviridae family. 

The term "heterologous membrane-associated protein" refers to a membrane-associated 
protein which is derived from a virus which is not a member of the same viral class or family 
as that from which the nucleocapsid protein of the vector particle is derived. "Viral class or 
family" refers to the taxonomic rank of class or family, as assigned by the International 
Committee on Taxonomy of Viruses. 

The term "Rhabdoviridae" refers to a family of enveloped RNA viruses that infect 
animals, including humans, and plants. The Rhabdoviridae family encompasses the genus 
Vesiculovirus which includes vesicular stomatitis virus (VSV), Cocal virus, Piry virus, 
Chandipura virus, and Spring viremia of carp virus (sequences encoding the Spring viremia of 
carp virus are available under GenBank accession number U18101). The G proteins of 
viruses in the Vesiculovirus genera are virally-encoded integral membrane proteins that form 
externally projecting homotrimeric spike glycoproteins complexes that are required for 
receptor binding and membrane fusion. The G proteins of viruses in the Vesiculovirus genera 
have a covalently bound palmititic acid (C^^) moiety. The amino acid sequences of the G 
proteins from the Vesiculoviruses are fairly well conserved. For example, the Piry virus G 
protein share about 38% identity and about 55% similarity with the VSV G proteins (several 
strains of VSV are known, e.g., Indiana, New Jersey, Orsay, San Juan, etc., and their G 
proteins are highly homologous). The Chandipura virus G protein and the VSV G proteins 
share about 37% identity and 52% similarity. Given the high degree of conservation (amino 
acid sequence) and the related fimctional characteristics (e.g., binding of the virus to the host 
cell and fusion of membranes, including syncytia formation) of the G proteins of the 
Vesiculoviruses, the G proteins from non-VSV Vesiculoviruses may be used in place of the 
VSV G protein for the pseudotyping of viral particles. The G proteins of the Lyssa viruses 
(another genera within the Rhabdoviridae family) also share a fair degree of conservation with 
the VSV G proteins and function in a similar manner (e.g., mediate fusion of membranes) and 
therefore may be used in place of the VSV G protein for the pseudotyping of viral particles. 
The Lyssa viruses include the Mokola virus and the Rabies viruses (several strains of Rabies 
virus are known and their G proteins have been cloned and sequenced). The Mokola virus G 
protein shares stretches of homology (particularly over the extracellular and transmembrane 
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domains) with the VSV G proteins which show about 31% identity and 48% similarity with 
the VSV G proteins. Preferred G proteins share at least 25% identity, preferably at least 30% 
identity and most preferably at least 35% identity with the VSV G proteins. The VSV G 
protein from which New Jersey strain (the sequence of this G protein is provided in GenBank 
accession numbers M27165 and M21557) is employed as the reference VSV G protein. 

As used herein, the term "lentivirus vector" refers to retroviral vectors derived from 
the Lentiviridae family (e.g., human immimodeficiency virus, simian immunodeficiency virus, 
equine infectious anemia virus, and caprine arthritis-encephalitis virus) that are capable of 
integrating into non-dividing cells (See, e.g., U.S. Pat. Nos. 5,994,136 and 6,013,516, both of 
which are incorporated herein by reference). 

The term "pseudotyped lentivirus vector" refers to lentivirus vector containing a 
heterologous membrane protein (e.g., a viral envelope glycoprotein or the G proteins of 
viruses in the Rhabdoviridae family such as VSV, Piry, Chandipura and Mokola). 

As used herein, the term "transposon" refers to transposable elements (e.g., Tn5, Tn7, 
and TnlO) that can move or transpose from one position to another in a genome. In general, 
the transposition is controlled by a transposase. The term "transposon vector," as used herein, 
refers to a vector encoding a nucleic acid of interest flanked by the terminal ends of 
transposon. Examples of transposon vectors include, but are not limited to, those described in 
U.S. Pat. Nos. 6,027,722; 5,958,775; 5,968,785; 5,965,443; and 5,719,055, all of which are 
incorporated herein by reference. 

As used herein, the term "adeno-associated virus (AAV) vector" refers to a vector 
derived from an adeno-associated virus serotype, including without limitation, AAV-1, AAV- 
2, AAV-3, AAV-4, AAV-5, AAVX7, etc. AAV vectors can have one or more of the AAV 
wild-type genes deleted in whole or part, preferably the rep and/or cap genes, but retain 
functional flanking ITR sequences. 

AAV vectors can be constructed using recombinant techniques that are known in the 
art to include one or more heterologous nucleotide sequences flanked on both ends (5' and 3') 
with functional AAV ITRs. In the practice of the invention, an AAV vector can include at 
least one AAV ITR and a suitable promoter sequence positioned upstream of the heterologous 
nucleotide sequence and at least one AAV ITR positioned downstream of the heterologous 
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sequence. A "recombinant AAV vector plasmid" refers to one type of recombinant AAV 
vector wherein the vector comprises a plasmid. As with AAV vectors in general, 5' and 3' 
ITRs flank the selected heterologous nucleotide sequence. 

AAV vectors can also include transcription sequences such as polyadenylation sites, as 
well as selectable markers or reporter genes, enhancer sequences, and other control elements 
which allow for the induction of transcription. Such control elements are described above. 

As used herein, the term "AAV virion" refers to a complete virus particle. An AAV 
virion may be a wild type AAV virus particle (comprising a linear, single-stranded AAV 
nucleic acid genome associated with an AAV capsid, i.e., a protein coat), or a recombinant 
AAV virus particle (described below). In this regard, single-stranded AAV nucleic acid 
molecules (either the sense/coding strand or the antisense/anticoding strand as those terms are 
generally defined) can be packaged into an AAV virion; both the sense and the antisense 
strands are equally infectious. 

As used herein, the term "recombinant AAV virion" or "rAAV" is defined as an 
infectious, replication-defective virus composed of an AAV protein shell encapsidating (i.e., 
surrounding with a protein coat) a heterologous nucleotide sequence, which in turn is flanked 
5' and 3' by AAV ITRs. A number of techniques for constructing recombinant AAV virions 
are known in the art (See, e.g., U.S. Patent No. 5,173,414; WO 92/01070; WO 93/03769; 
Lebkowski et aL, Molec. Cell. Biol. 8:3988-3996 [1988]; Vincent et aL, Vaccines 90 [1990] 
(Cold Spring Harbor Laboratory Press); Carter, Current Opinion in Biotechnology 3:533-539 
[1992]; Muzyczka, Current Topics in Microbiol, and Immunol. 158:97-129 [1992]; Kotin, 
Human Gene Therapy 5:793-801 [1994]; Shelling and Smith, Gene Therapy 1:165-169 
[1994]; and Zhou et al., J. Exp. Med. 179:1867-1875 [1994], all of which are incorportaed 
herein by reference). 

Suitable nucleotide sequences for use in AAV vectors (and, indeed, any of the vectors 
described herein) include any functionally relevant nucleotide sequence. Thus, the AAV 
vectors of the present invention can comprise any desired gene that encodes a protein that is 
defective or missing from a target cell genome or that encodes a non-native protein having a 
desired biological or therapeutic effect (e.g., an antiviral function), or the sequence can 
correspond to a molecule having an antisense or ribozyme function. Suitable genes include 
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those used for the treatment of inflammatory diseases, autoimmune, chronic and infectious 
diseases, including such disorders as AIDS, cancer, neiirological diseases, cardiovascular 
disease, hypercholestemia; various blood disorders including various anemias, thalasemias and 
hemophilia; genetic defects such as cystic fibrosis, Gaucher 's Disease, adenosine deaminase 
(ADA) deficiency, emphysema, etc. A number of antisense oligonucleotides (e.g., short 
oligonucleotides complementary to sequences around the translational initiation site (AUG 
codon) of an mRNA) that are useful in antisense therapy for cancer and for viral diseases 
have been described in the art. (See, e.g., Han et al, Proc. Natl. Acad. Sci. USA 88:4313- 
4317 [1991]; Uhlmann et al, Chem. Rev, 90:543-584 [1990]; Helene et al, Biochim. 
Biophys. Acta. 1049:99-125 [1990]; Agarwal et al., Proc. Natl. Acad. Sci. USA 85:7079-7083 
[1989]; and Heikkila et al. Nature 328:445-449 [1987]). For a discussion of suitable 
ribozymes, see, e.g., Cech et al (1992) J. Biol. Chem. 267:17479-17482 and U.S. Patent No. 
5,225,347, incorporated herein by reference. 

By "adeno-associated virus inverted terminal repeats" or "AAV ITRs" is meant the art- 
recognized palindromic regions found at each end of the AAV genome which function 
together in cis as origins of DNA replication and as packaging signals for the virus. For use 
with the present invention, flanking AAV ITRs are positioned 5' and 3' of one or more 
selected heterologous nucleotide sequences and, together with the rep coding region or the 
Rep expression product, provide for the integration of the selected sequences into the genome 
of a target cell. 

The nucleotide sequences of AAV ITR regions are known (See, e.g., Kotin, Human 
Gene Therapy 5:793-801 [1994]; Bems, K.I. "Parvoviridae and their Replication" in 
Fundamental Virology, 2nd Edition, (B.N. Fields and D.M. Knipe, eds.) for the AAV-2 
sequence. As used herein, an "AAV ITR" need not have the wild-type nucleotide sequence 
depicted, but may be altered, e.g., by the insertion, deletion or substitution of nucleotides. 
Additionally, the AAV ITR may be derived from any of several AAV serotypes, including 
without limitation, AAV-1, AAV-2, AAV-3, AAV-4, AAV-5, AAVX7, etc. The 5' and 3' 
ITRs which flank a selected heterologous nucleotide sequence need not necessarily be 
identical or derived from the same AAV serotype or isolate, so long as they function as 
intended, i.e., to allow for the integration of the associated heterologous sequence into the 
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target cell genome when the rep gene is present (either on the same or on a different vector), 
or when the Rep expression product is present in the target cell. 

As used herein the term, the term "m vitro " refers to an artificial environment and to 
processes or reactions that occur within an artificial environment. In vitro environments can 
consist of, but are not limited to, test tubes and cell cultures. The term vivo'' refers to the 
natural environment (e.g., an animal or a cell) and to processes or reaction that occur within a 
natural environment. 

As used herein, the term "clonally derived" refers to a cell line that it derived from a 
single cell. 

As used herein, the term "non-clonally derived" refers to a cell line that is derived 
from more than one cell. 

As used herein, the term "passage" refers to the process of diluting a culture of cells 
that has grown to a particular density or confluency {e.g., 70% or 80% confluent), and then 
allowing the diluted cells to regrow to the particular density or confluency desired {e.g. , by 
replating the cells or establishing a new roller bottle culture with the cells. 

As used herein, the term "stable," when used in reference to genome, refers to the 
stable maintenance of the information content of the genome from one generation to the next, 
or, in the particular case of a cell line, from one passage to the next. Accordingly, a genome 
is considered to be stable if no gross changes occur in the genome {e.g., a gene is deleted or a 
chromosomal translocation occurs). The term "stable" does not exclude subtle changes that 
may occur to the genome such as point mutations. 

As used herein, the term "response," when used in reference to an assay, refers to the 
generation of a detectable signal {e.g., accumulation of reporter protein, increase in ion 
concentration, accumulation of a detectable chemical product). 

As used herein, the term "membrane receptor protein" refers to membrane spanning 
proteins that bind a ligand {e.g., a hormone or neurotransmitter). As is known in the art, 
protein phosphorylation is a common regulatory mechanism used by cells to selectively 
modify proteins carrying regulatory signals from outside the cell to the nucleus. The proteins 
that execute these biochemical modifications are a group of enzymes known as protein 
kinases. They may further be defined by the substrate residue that they target for 
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phosphorylation. One group of protein kinases are the tyrosine kinases (TKs) which 
selectively phosphorylate a target protein on its tyrosine residues. Some tyrosine kinases are 
membrane-bound receptors (RTKs), and, upon activation by a ligand, can autophosphorylate 
as well as modify substrates. The initiation of sequential phosphorylation by ligand 
stimulation is a paradigm that underUes the action of such effectors as, for example, epidermal 
grov^h factor (EGF), insulin, platelet-derived growth factor (PDGF), and fibroblast growth 
factor (FGF). The receptors for these ligands are tyrosine kinases and provide the interface 
between the binding of a ligand (hormone, growth factor) to a target cell and the transmission 
of a signal into the cell by the activation of one or more biochemical pathways. Ligand 
binding to a receptor tyrosine kinase activates its intrinsic enzymatic activity (See, e.g., 
Ulkich and Schlessinger, Cell 61:203-212 [1990]). Tyrosine kinases can also be cytoplasmic, 
non-receptor-type enzymes and act as a downstream component of a signal transduction 
pathway. 

As used herein, the term "signal transduction protein" refers to a proteins that are 
activated or otherwise effected by ligand binding to a membrane receptor protein or some 
other stimulus. Examples of signal transduction protein include adenyl cyclase, phospholipase 
C, and G-proteins. Many membrane receptor proteins are coupled to G-proteins (i.e., 
G-protein coupled receptors (GPCRs); for a review, see Neer, 1995, Cell 80:249-257 [1995]). 
Typically, GPCRs contain seven transmembrane domains. Putative GPCRs can be identified 
on the basis of sequence homology to known GPCRs. 

GPCRs mediate signal transduction across a cell membrane upon the binding of a 
ligand to an extracellular portion of a GPCR. The intracellular portion of a GPCR interacts 
with a G-protein to modulate signal transduction from outside to inside a cell. A GPCR is 
therefore said to be "coupled" to a G-protein. G-proteins are composed of three polypeptide 
subunits: an a subunit, which binds and hydrolyses GTP, and a dimeric Py subunit. In the 
basal, inactive state, the G-protein exists as a heterotrimer of the a and py subunits. When 
the G-protein is inactive, guanosine diphosphate (GDP) is associated with the a subunit of the 
G-protein. When a GPCR is bound and activated by a ligand, the GPCR binds to the 
G-protein heterotrimer and decreases the affinity of the Ga subunit for GDP. In its active 
state, the G subunit exchanges GDP for guanine triphosphate (GTP) and active Ga subxmit 
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disassociates from both the receptor and the dimeric Py subunit. The disassociated, active Ga 
subunit transduces signals to effectors that are "downstream" in the G-protein signaUing 
pathway within the cell. Eventually, the G-protein' s endogenous GTPase activity returns 
active G subunit to its inactive state, in which it is associated with GDP and the dimeric (3y 
subunit. 

Numerous members of the heterotrimeric G-protein family have been cloned, including 
more than 20 genes encoding various Ga subunits. The various G subunits have been 
categorized into four families, on the basis of amino acid sequences and functional homology. 
These four famiUes are termed Ga^, Ga^, Ga^, and Ga,2. Functionally, these four famiUes 
differ with respect to the intracellular signaling pathways that they activate and the GPCR to 
which they couple. 

For example, certain GPCRs normally couple with Ga^ and, through Ga^, these 
GPCRs stimulate adenylyl cyclase activity. Other GPCRs normally couple with GGa^, and 
through GGa^, these GPCRs can activate phospholipase C (PLC), such as the (5 isoform of 
phospholipase C {i.e., PLCp, Stermweis and Smrcka, Trends in Biochem. Sci. 17:502-506 
[1992]). 

As used herein, the term "immunoglobulin" refers to proteins which bind a specific 
antigen. Immunoglobuhns include, but are not limited to, polyclonal, monoclonal, chimeric, 
and humanized antibodies. Fab fragments, F(ab')2 fragments, and includes immunoglobulins 
of the following classes: IgG, IgA, IgM, IgD, IbE, and secreted immunoglobulins (sig). 
Immunoglobulins generally comprise two identical heavy chains (y, a, |i, 5, or s) and two 
light chains (k or X). 

As used herein, the term "antigen binding protein" refers to proteins which bind to a 
specific antigen. "Antigen binding proteins" include, but are not limited to, immunoglobulins, 
including polyclonal, monoclonal, chimeric, and humanized antibodies; Fab fragments, F(ab')2 
fragments, and Fab expression libraries; and single chain antibodies. Various procedures 
known in the art are used for the production of polyclonal antibodies. For the production of 
an antibody, various host animals can be immunized by injection with the peptide 
corresponding to the desired epitope including but not limited to rabbits, mice, rats, sheep, 
goats, etc. In a preferred embodiment, the peptide is conjugated to an immunogenic carrier 

26 



(e.g., diphtheria toxoid, bovine serum albumin (BSA), or keyhole limpet hemocyanin (KLH)). 
Various adjuvants are used to increase the immunological response, depending on the host 
species, including but not limited to Freund's (complete and incomplete), mineral gels such as 
aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, keyhole limpet hemocyanins, dinitrophenol, and 
potentially useful human adjuvants such as BCG (Bacille Calmette-Guerin) and 
Corynebacterium parvum. 

For preparation of monoclonal antibodies, any technique that provides for the 
production of antibody molecules by continuous cell lines in culture may be used (See, e.g., 
Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY). These include, but are not limited to, the hybridoma technique 
originally developed by Kohler and Milstein (Kohler and Milstein, Nature 256:495-497 
[1975]), as well as the trioma technique, the human B-cell hybridoma technique (See e.g., 
Kozbor et al Immunol. Today 4:72 [1983]), and the EBV-hybridoma technique to produce 
human monoclonal antibodies (Cole et al, in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc., pp. 77-96 [1985]). 

According to the invention, techniques described for the production of single chain 
antibodies (U.S. Patent 4,946,778; herein incorporated by reference) can be adapted to 
produce specific single chain antibodies as desired. An additional embodiment of the 
invention utilizes the techniques known in the art for the construction of Fab expression 
libraries (Huse et al, Science 246:1275-1281 [1989]) to allow rapid and easy identification of 
monoclonal Fab fragments with the desired specificity. 

Antibody fragments that contain the idiotype (antigen binding region) of the antibody 
molecule can be generated by known techniques. For example, such fragments include but 
are not limited to: the F(ab')2 fragment that can be produced by pepsin digestion of an 
antibody molecule; the Fab' fragments that can be generated by reducing the disulfide bridges 
of an F(ab')2 fragment, and the Fab fragments that can be generated by treating an antibody 
molecule with papain and a reducing agent. 

Genes encoding antigen binding proteins can be isolated by methods known in the art. 
In the production of antibodies, screening for the desired antibody can be accomplished by 
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techniques known in the art (e.g., radioimmunoassay, ELISA (enzyme-linked immunosorbant 
assay), "sandwich" immunoassays, immunoradiometric assays, gel diffusion precipitin 
reactions, immunodiffusion assays, in situ immunoassays (using colloidal gold, enzyme or 
radioisotope labels, for example), Western Blots, precipitation reactions, agglutination assays 
(e.g., gel agglutination assays, hemagglutination assays, etc.), complement fixation assays, 
immunofluorescence assays, protein A assays, and Immunoelectrophoresis assays, etc.) etc. 

As used herein, the term "reporter gene" refers to a gene encoding a protein that may 
be assayed. Examples of reporter genes include, but are not limited to, luciferase (See, e.g. , 
deWet et aL, Mol. Cell. Biol. 7:725 [1987] and U.S. Pat Nos.,6,074,859; 5,976,796; 
5,674,713; and 5,618,682; all of which are incorporated herein by reference), green 
fluorescent protein (e.g., GenBank Accession Number U43284; a number of GFP variants are 
commercially available from CLONTECH Laboratories, Palo Alto, CA), chloramphenicol 
acetyltransferase, P-galactosidase, alkaline phosphatase, and horse radish peroxidase. 

As used herein, the term "purified" refers to molecules, either nucleic or amino acid 
sequences, that are removed from their natural environment, isolated or separated. An 
"isolated nucleic acid sequence" is therefore a purified nucleic acid sequence. "Substantially 
purified" molecules are at least 60% free, preferably at least 75% free, and more preferably at 
least 90% free from other components with which they are naturally associated. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel regulatory sequences for use in expression 
vectors. In some embodiments, the present invention provides retroviral expression vectors 
containing novel regulatory elements. In addition, in still other embodiments, the present 
invention provides methods for expressing proteins of interest in host cells. In particularly 
preferred embodiments, the present invention provides methods for expressing two chains of a 
multisubunit protein (e.g., a heavy chain and a light chain of an immunoglobulin or the 
subunits of follicle stimulating hormone) in a nearly equal ratio. These methods take 
advantage of the novel regulatory sequences and vectors of the present invention to solve 
problems in the prior art. 
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L Components of Retroviral Expression Vectors 

In particularly preferred embodiments, the retroviral vectors of the present invention 
include the following elements in operable association: a) a 5' LTR; b) a packaging signal; c) 
a 3' LTR, and d) a nucleic acid encoding a protein of interest located between the 5' and 3' 
LTRs. In addition, in some preferred embodiments, novel compositions, including, but not 
limited to those described below are included in expression vectors in order to aid in the 
expression, secretion and purification of proteins of interest. The following novel elements 
are described in more detail below: bovine/human hybrid alpha-lactalbumin (a-LA) promoter 
(A); mutant RNA export element (B); and internal ribosome entry site (C). 

A, Bovine/Human Hybrid Alpha Lactalbumin Promoter 

In some embodiments, the present invention provides a hybrid a-lactalbumin (a-LA) 
promoter. It is contemplated that the hybrid promoter may be constructed from portions of 
any two or more mammalian a-lactalbumin promoters (e.g., human, bovine, goat, sheep, 
rabbit, or mouse a-lactalbumin promoters among others; see, e.g., GenBank Accession 
numbers AF124257; AF123893; AX067504; Soulier et al. Transgenic Res. 8(1):23-31 (1999); 
McKee et al, Nat. Biotech. 16(7):647-51 (1998); Lubon et al, Biochem. J. 256(2):391-6 
(1988); and U.S. Pat. No. 5,530,177). In some embodiments, the portion of at least one of 
the promoters contributing to the hybrid is at least 50 nucleotides in length, while in preferred 
embodiments, the portion of at least one of the promoters contributing to the hybrid is at least 
100 nucleotides in length, while in particularly preferred embodiments, the portion of at least 
one of the promoters contributing to the hybrid is at least 500 nucleotides length, with the 
portion of the at least one other promoter contributing to the hybrid being of similar or longer 
length. Once constructed, the hybrid promoters can be assayed for functionality by operably 
linking the promoter to a reporter gene such as beta-galactosidase, green fluorescent protein, 
or luciferase, creating a transgenic animal such as transgenic mouse or bovine that comprises 
the resulting construct, and assaying various tissues of the resulting transgenic animal to 
determine the specificity of expression from the hybrid promoter. In preferred embodiments, 
expression from the hybrid promoter is substantially specific to the mammary gland, and in 
particular to mammary epithelial cells, with no or only trace levels of expression of in other 
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tissues. 

In particularly preferred embodiments, the hybrid promoter is a bovine/human hybrid 
a-lactalbumin (a-LA) promoter (SEQ ID NO: 1). The human portion of the promoter was 
derived from human genomic DNA and contains bases from +15 relative to the transcription 
start point to -600 relative to the transcription start point. The bovine portion is attached to 
the end of the human portion and corresponds to bases -550 to -2000 relative to the 
transcription start point. 

The hybrid promoter preferably used in the present invention utilizes a region of the 
human promoter that contained an intemal poly-adenylation signal. The intemal poly- 
adenylation signal was removed by mutation. The mutation was at base 2012 and involved a 
change from A to T. The present invention is not limited to any particular mechanism of 
action. Indeed, an understanding of the mechanism is not required to practice the present 
invention. Nevertheless, it is contemplated the removal of poly-adenylation signals improves 
retroviral RNA production by eliminating premature mRNA termination problems. In 
addition, it is contemplated that additional enhancer regions exist in the human, but not the 
bovine sequence. The hybrid promoter was constructed to take advantage of these additional 
sequences. Likewise, the hybrid promoter contains bovine elements that may or may not be 
found in the human promoter. 

B. RNA Export Element 

In some embodiments, the present invention comprises a mutant RNA export element 
(pre-mRNA processing element (PPE), Mertz sequence, or WPRE; See, e.g., U.S. Pat. Nos. 
5,914,267 and 5,686,120 and PCT Publication WO99/14310, all of which are incorporated 
herein by reference). The present invention is not limited to any particular mechanism of 
action. Indeed, an understanding of the mechanism is not required to practice the present 
invention. Nevertheless, it is contemplated that the use of RNA export elements allows or 
facilitates high levels of expression of the protein of interest without incorporating splice 
signals or introns in the nucleic acid sequence encoding the protein of interest. 

In some embodiments, a mutated PPE element is utiUzed. In some particularly 
preferred embodiments, the PPE sequence is mutated to remove intemal ATG sequences. The 



30 



present invention is not limited to any particular mechanism of action. Indeed, an 
understanding of the mechanism is not required to practice the present invention. 
Nevertheless, it is contemplated that the removal of internal start sequences prevents potential 
unwanted translation initiation. In some embodiments utilizing a mutated PPE sequence, 
bases 4, 112, 131, and 238 of SEQ ID NO: 2 were changed from a G to a T. In all cases, 
these changes resulted in and ATG start codon being mutated to an ATT codon. In some 
embodiments, the mutated PPE sequence is placed in the 5' untranslated region (UTR) of the 
mRNA encoding the gene of interest. In other embodiments, the mutated PPE sequence is 
placed in the 3' UTR of the mRNA encoding the gene of interest. In some preferred 
embodiments, two mutated PPE sequences separated by a linker are placed in a head to tail 
array (See e.g., SEQ ID N0:2) . It has been shown that two copies of the sequence cause a 
more dramatic effect on mRNA export. In other embodiments, 2-20 copies of the mutated 
PPE sequence are placed in the mRNA encoding the gene of interest. 

Functional variants of the above sequences are easily identified by operably linking the 
variant sequence to a test gene in a vector, transfecting a host cell with the vector, and 
analyzing the host cell for expression of the test gene. Suitable test genes, host cells, and 
vectors are disclosed in the examples. 

C. Internal Ribosome Entry Site 

In some embodiments, the present invention comprises an internal ribosome entry site 
(IRES)/signal peptide sequence (e.g., SEQ ID NOs:3 and 12). The present invention 
contemplates that a variety of signal sequences may be fused with a variety of IRES 
sequences. Suitable signal sequences include those from a-lactalbumin, casein, tissue 
plasminogen activator, serum albumin, lactoferrin, and lactoferrin (See, e.g., Zwizinski et al, 
J. Biol. Chem. 255(16): 7973-77 [1980], Gray et al. Gene 39(2): 247-54 [1985], and Martial 
et al., Science 205: 602-607 [1979]). Such secretion signal sequences are preferably derived 
from genes encoding polypeptides secreted from the cell type targeted for tissue-specific 
expression (e.g., secreted milk proteins for expression in and secretion from mammary 
secretory cells). Suitable IRES sequences include, but are not limited, to those derived from 
foot and mouth disease virus (FDV), encephalomyocarditis virus, poliovirus and RDV 
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(Scheper et al, Biochem. 76: 801-809 [1994]; Meyer et aL, J. Virol. 69: 2819-2824 [1995]; 
Jang et ai, 1988, J. Virol. 62: 2636-2643 [1998]; Haller et aL, J. Virol. 66: 5075-5086 
[1995]). Functional IRES/signal peptide sequences may be identified by operably linking two 
genes with the sequence and an appropriate promoter, transfecting a host cell with the 
construct, and assaying the host cell for production the proteins encoding by the two genes. 
Suitable genes, vector constructs, and host cells for such screening are provided in the 
examples. In preferred embodiments, the coding sequences for the IRES and signal peptide 
are adjacent to one another, with no intervening coding sequences (i.e., that may be separated 
by noncoding sequences in some instances). 

The present invention is not limited to any particular mechanism of action. Indeed, an 
understanding of the mechanism is not required to practice the present invention. The IRES 
allows translation of the gene to start at the IRES sequence, thereby resulting in the 
expression of two genes of interest in the same construct. The bovine a-lactalbumin signal 
peptide or casein signal peptide causes extracellular secretion of expressed protein products. 

In some embodiments, the initial ATG of the signal peptide is attached to the IRES in 
order to allow the most efficient translation initiation from the IRES. In some embodiments, 
the second codon of the signal peptide is mutated from an ATG to a GCC, changing the 
second amino acid of the a-lactalbumin signal peptide from a methionine to an alanine. The 
present invention is not limited to any particular mechanism of action. Indeed, an 
understanding of the mechanism is not required to practice the present invention. 
Nevertheless, it is contemplated that this mutation facilitates more efficient translation 
initiation by the IRES. In some embodiments, the (IRES)/signal peptide is inserted into a 
vector between two genes of interest. In these embodiments, the (IRES)/signal peptide creates 
a second translation initiation site, allowing for the expression of two polypeptides from the 
same expression vector. In other words, a single transcript is produced that encodes two 
different polypeptides (e.g., the heavy and light chains of an immunoglobulin). 

In some embodiments, the signal peptide is derived from a-lactalbumin. In other 
embodiments, the present invention comprises an intemal ribosome entry site (IRES)/modified 
bovine a-Sl Casein signal peptide fusion protein (SEQ ID NO: 12). The present invention is 
not limited to any particular mechanism of action. Indeed, an understanding of the 
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mechanism is not required to practice the present invention. The IRES allows translation of 
the gene to start at the IRES sequence, allowing the expression of two genes of interest in the 
same construct. The bovine a-Sl casein signal peptide causes secretion of expressed protein 
products. 

In some embodiments the second codon of the bovine a-S 1 casein signal peptide is 
mutated from a AAA to a GCC. The mutation results in the second codon of the signal 
peptide being changed from an alanine to a lysine. In some embodiments, the third codon of 
the signal peptide is mutated from a CTT to a TTG, a change which does not result and an 
amino acid substitution. The present invention is not limited to any particular mechanism of 
action. Indeed, an imderstanding of the mechanism is not required to practice the present 
invention. Nevertheless, it is contemplated that this mutation allows more efficient translation 
initiation by the IRES. 

II. Retroviral Expression Vectors 

In some embodiments, the present invention comprises retroviral expression vectors. 
Retroviruses (family Retro viridae) are generally divided into three groups: the spumaviruses 
(e.g., human foamy virus); the lentiviruses (e.g,, human immunodeficiency virus and sheep 
visna virus), and the oncoviruses (e.g., MLV and Rous sarcoma virus). 

Retroviruses are enveloped (i.e., surrounded by a host cell-derived lipid bilayer 
membrane) single-stranded RNA viruses which infect animal cells. When a retrovirus infects 
a cell, its RNA genome is converted into a double- stranded linear DNA form (i.e., it is 
reverse transcribed). The DNA form of the virus is then integrated into the host cell genome 
as a provirus. The provirus serves as a template for the production of additional viral 
genomes and viral mRNAs. Mature viral particles containing two copies of genomic RNA 
bud from the surface of the infected cell. The viral particle comprises the genomic RNA, 
reverse transcriptase and other pol gene products inside the viral capsid (containing the viral 
gag gene products) which is surrounded by a lipid bilayer membrane derived from the host 
cell containing the viral envelope glycoproteins (also referred to as membrane-associated 
proteins). 

The genomic organization of numerous retroviruses is well known to the art and this 
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has allowed the adaptation of the retroviral genome to produce retroviral vectors. The 
production of a recombinant retroviral vector carrying a gene of interest is typically achieved 
in two stages. 

First, the gene of interest is inserted into a retroviral vector which contains the 
sequences necessary for the efficient expression of the gene of interest (including promoter 
and/or enhancer elements which may be provided by the viral long terminal repeats (LTRs) or 
by an internal promoter/enhancer and relevant splicing signals), sequences required for the 
efficient packaging of the viral RNA into infectious virions (e.g., the packaging signal (Psi), 
the tRNA primer binding site (-PBS), the 3' regulatory sequences required for reverse 
transcription (+PBS)) and the viral LTRs. The LTRs contain sequences required for the 
association of viral genomic RNA, reverse transcriptase and integrase functions, and sequences 
involved in directing the expression of the genomic RNA to be packaged in viral particles. 
For safety reasons, many recombinant retroviral vectors lack functional copies of the genes 
which are essential for viral replication (these essential genes are either deleted or disabled); 
therefore, the resulting virus is said to be "replication defective". 

Second, following the construction of the recombinant vector, the vector DNA is 
introduced into a packaging cell line. Packaging cell lines provide viral proteins required in 
trans for the packaging of the viral genomic RNA into viral particles having the desired host 
range (i.e., the viral-encoded gag, pol and env proteins). The host range is controlled, in part, 
by the type of envelope gene product expressed on the surface of the viral particle. 
Packaging cell lines may express ecotrophic, amphotropic or xenotropic envelope gene 
products. Alternatively, the packaging cell line may lack sequences encoding a viral envelope 
(env) protein. In this case the packaging cell line will package the viral genome into particles 
lacking a membrane-associated protein (e.g., an env protein). In order to produce viral 
particles containing a membrane associated protein which will permit entry of the virus into a 
cell, the packaging cell line containing the retroviral sequences is commonly transfected with 
sequences encoding a membrane- associated protein (e.g., the G protein of vesicular stomatitis 
virus (VSV)). The transfected packaging cell will then produce viral particles which contain 
the membrane-associated protein expressed by the transfected packaging cell line; these viral 
particles which contain viral genomic RNA derived from one virus encapsidated by the 
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envelope proteins of another virus are said to be "pseudo typed virus particles". 

The retroviral vectors of the present invention can be further modified to include 
additional regulatory sequences. As described above, the retroviral vectors of the present 
invention include the following elements in operable association: a) a 5' LTR; b) a packaging 
signal; c) a 3' LTR; and d) a nucleic acid encoding a protein of interest located between the 
5' and 3' LTRs. In some embodiments of the present invention, the nucleic acid of interest 
may be arranged in opposite orientation to the 5' LTR when transcription from an internal 
promoter is desired. Suitable intemal promoters include, but are not limited to, the alpha- 
lactalbumin promoter, the CMV promoter, and the thymidine kinase promoter. 

In other embodiments of the present invention, where secretion of the protein of 
interest is desired, the vectors are modified by including a signal peptide sequence in operable 
association with the protein of interest. The sequences of several suitable signal peptides are 
known in the art, including, but not limited to, those derived from tissue plasminogen 
activator, human growth hormone, lactoferrin, alpha SI -casein, and alpha-lactalbumin. 

In other embodiments of the present invention, the vectors are modified by 
incorporating one or more of the elements described above, including, but not limited to, an 
RNA export element, a PPE element, and an IRES^ovine a-lactalbumin signal sequence. 

The retroviral vectors of the present invention may further comprise a selectable 
marker which facilitates selection of transformed cells. A number of selectable markers 
known in the art find use in the present invention, including, but not limited to the bacterial 
aminoglycoside 3' phosphotransferase gene (also referred to as the ''neo gene") that confers 
resistance to the drug G418 in mammalian cells, the bacterial hygromycin G 
phosphotransferase {hyg) gene that confers resistance to the antibiotic hygromycin, and the 
bacterial xanthine-guanine phosphoribosyl transferase gene (also referred to as the ''gpt gene") 
that confers the ability to grow in the presence of mycophenolic acid. In some embodiments, 
the selectable marker gene is provided as part of a polycistronic sequence also encoding the 
protein of interest. 

In still other embodiments of the present invention, the retroviral vectors may comprise 
recombination elements recognized by a recombination system (e.g., the cre/loxP or flp 
recombinase systems: See, e.g., Hoess et al. Nucleic Acids Res., 14:2287 [1986], O'Gorman 
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et al. Science 251:1351 [1991], van Deursen et al, Proc. Natl. Acad. Sci. USA 92:7376 
[1995], and U.S. Pat. No. 6,025,192, incorporated herein by reference). After integration of 
the vectors into the genome of the host cell, the host cell can be transiently transfected (e.g., 
by electroporation, lipofection, or microinjection) with either a recombinase enzyme (e.g., Cre 
recombinase) or a nucleic acid sequence encoding the recombinase enzyme and one or more 
nucleic acid sequences encoding a protein of interest flanked by sequences recognized by the 
recombination enzyme so that the nucleic acid sequence of interest is inserted into the 
integrated vector. 

Viral vectors, including recombinant retroviral vectors, provide a more efficient means 
of transferring genes into cells, as compared to other techniques such as calcium phosphate- 
DNA co-precipitation or DEAE-dextran-mediated transfection, electroporation or 
microinjection of nucleic acids. Nonetheless, the present invention is not limited to any 
particular mechanism. Indeed, an understanding of the mechanism is not required to practice 
the present invention. Nevertheless, it is believed that the efficiency of viral transfer is due in 
part to the fact that the transfer of nucleic acid is a receptor-mediated process (/.e., the virus 
binds to a specific receptor protein on the surface of the target cell). In addition, once inside 
a cell, the virally transferred nucleic acid integrates in controlled manner. This is in contrast 
to nucleic acids transferred by other means {e.g., calcium phosphate-DNA co-precipitation), 
which are typically subject to rearrangement and degradation. 

Example 1, below, describes several illustrative examples of retroviral vectors of the 
current invention. However, it is not intended that the present invention be limited to the 
vectors described in Example 1 . Indeed, any suitable retroviral vectors containing the novel 
elements of the present invention are contemplated. Furthermore, the elements described 
above find use in other vectors such as AAV vectors, transposon vectors, plasmids, bacterial 
artificial chromosomes, and yeast artificial chromosomes. 

III. Expression of Proteins 

In some embodiments of the present invention, the vectors and regulatory elements 
described above find use in the expression of one or more proteins. The present invention is 
not limited to the production of any particular protein. Indeed, the production of a wide 
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variety of proteins is contemplated, including, but not limited to, erythropoietin, alpha- 
interferon, alpha- 1 proteinase inhibitor, angiogenin, antithrombin III, beta- acid decarboxylase, 
human growth hormone, bovine growth hormone, porcine growth hormone, human serum 
albumin, beta-interferon, calf intestine alkaline phosphatase, cystic fibrosis transmembrane 
5 regulator, Factor VIII, Factor IX, Factor X, insulin, lactoferrin, tissue plasminogen activator, 
myelin basic protein, insulin, proinsulin, prolactin, hepatitis B antigen, immunoglobulins, 
monoclonal antibody CTLA4 Ig, Tag 72 monoclonal antibody, Tag 72 single chain antigen 
binding protein, protein C, cytokines and their receptors (e.g., tumor necrosis factor alpha and 
beta), growth hormone releasing factor, parathyroid hormone, thyroid stimulating hormone, 
10 lipoproteins, alpha- 1 -antitrypsin, follicle stimulating hormone, calcitonin, luteinizing hormone, 
□ glucagon, von Willebrands factor, atrial natriuretic factor, lung surfactant, urokinase, 
iJ; bombesin, thrombin, hemopoietic growth factor, enkephalinase, human macrophage 
^' inflammatory protein (MIP-1 -alpha), serum albimiins (e.g., mullerian- inhibiting substance), 
p relaxin A-chain, relaxin B-chain, prorelaxin, mouse gonadotropin-associated peptide, 

j5 

15E beta-lactamase, DNase, inhibin, activin, vascular endothelial growth factor (VEGF), receptors 

y f 

for hormones or growth factors, integrin, protein A or D, rheumatoid factors, neurotrophic 

U 

gi factors (e.g., bone-derived neurotrophic factor (BDNF)), neurotrophin-3, -4, -5, or -6 (NT-3, 

i'3 : 

1^ NT-4, NT-5, or NT-6), nerve growth factors (e.g., NGF-beta), platelet-derived growth factor 
p (PDGF), fibroblast growth factors (e.g., aFGF and bFGF), epidermal growth factor (EGF), 

20 transforming growth factor (TGF) (e.g., TGF-alpha and TGF-beta, including TGF-pi, 
TGF-P2, TGF.p3, TGF-p4, or TGF-pS), insulin-like growth factor-I and -II (IGF-I and 
IGF-II), des(l-3)-IGF-I (brain IGF-I), insulin like growth factor binding proteins; CD proteins 
(e.g., CD-3, CD-4, CD-8, and CD-19), osteoinductive factors, immunotoxins, bone 
morphogenetic protein (BMP); interferons (e.g., interferon-alpha, -beta, and -gamma), colony 

25 stimulating factors (CSFs) ( e.g., M-CSF, GM-CSF, and G-CSF), interleukins (IL) ( e.g., IL-1 
to IL-10), superoxide dismutase, T-cell receptors, surface membrane proteins, decay 
accelerating factor, viral antigens (e.g., a portion of the AIDS envelope), transport proteins, 
homing receptors, addressins, regulatory proteins, antibodies, chimeric proteins (e.g., 
immunoadhesins), and fragments of any of the above-listed polypeptides. One skilled in the 

30 art recognizes that the nucleic acid sequences for these proteins and their homologs are 
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available from public databases {e.g.. Gen Bank). 

In some embodiments, the vectors of the present invention are used to express more 
than one exogenous protein. For example, host cells may be transfected with vectors 
encoding different proteins of interest (e.g., cotransfection with one vector encoding a first 
protein of interest and a second vector encoding a second protein of interest). In other 
embodiments, more than one protein is expressed by arranging the nucleic acids encoding the 
different proteins of interest in a polycistronic sequence (e.g., bicistronic or tricistronic 
sequences). This arrangement is especially useful when expression of the different proteins of 
interest in a 1:1 molar ratio is desired (e.g., expression of the light and heavy chains of an 
immunoglobulin molecule). 

A. Expression of Protein in Cell Culture 

In some embodiments of the present invention, proteins are expressed in cell culture. 
In some embodiments, retroviral vectors are used to express protein in mammalian tissue 
culture host cells, including, but not limited to, rat fibroblast cells, bovine kidney cells, and 
human kidney cells, while in some preferred embodiments, protein is expressed in bovine 
mammary cells. The host cells are cultured according to methods known in the art; suitable 
culture conditions for mammalian cells are well known in the art (See e.g., J. Immunol. 
Methods 56:221 [1983], Animal Cell Culture: A Practical Approach 2nd Ed., Rickwood, D. 
and Hames, B. D., eds. Oxford University Press, New York [1992]). 

The present invention contemplates the transfection of a variety of host cells 
with integrating vectors. A number of mammalian host cell lines are known in the art. In 
general, these host cells are capable of growth and survival when placed in either monolayer 
culture or in suspension culture in a medium containing the appropriate nutrients and growth 
factors, as is described in more detail below. Typically, the cells are capable of expressing 
and secreting large quantities of a particular protein of interest into the culture medium. 
Examples of suitable mammalian host cells include, but are not limited to Chinese hamster 
ovary cells (CHO-Kl, ATCC CCl-61); bovine mammary epithelial cells (ATCC CRL 10274; 
bovine mammary epithelial cells); monkey kidney CVl line transformed by SV40 (COS-7, 
ATCC CRL 1651); human embryonic kidney line (293 or 293 cells subcloned for growth in 
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suspension culture; see, e.g., Graham et al, J. Gen Virol, 36:59 [1977]); baby hamster kidney 
cells (BHK, ATCC CCL 10); mouse Sertoli cells (TM4, Mather, Biol. Reprod. 23:243-251 
[1980]); monkey kidney cells (CVl ATCC CCL 70); African green monkey kidney cells 
(VERO-76, ATCC CRL-1587); human cervical carcinoma cells (HELA, ATCC CCL 2); 
canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, ATCC CRL 
1442); human lung cells (W138, ATCC CCL 75); human liver cells (Hep G2, HB 8065); 
mouse mammary tumor (MMT 060562, ATCC CCL51); TRI cells (Mather et al.. Annals 
N.Y. Acad. Sci., 383:44-68 [1982]); MRC 5 cells; FS4 cells; rat fibroblasts (208F cells); 
MDBK cells (bovine kidney cells); and a human hepatoma line (Hep G2). 

In addition to mammalian cell lines, the present invention also contemplates the 
transfection of plant protoplasts with integrating vectors at a low or high multiplicity of 
infection. For example, the present invention contemplates a plant cell or whole plant 
comprising at least one integrated integrating vector, preferably a retroviral vector, and most 
preferably a pseudotyped retroviral vector. All plants that can be produced by regeneration 
from protoplasts can also be transfected using the process according to the invention (e.g., 
cultivated plants of the genera Solanum, Nicotiana, Brassica, Beta, Pisum, Phaseolus, 
Glycine, Helianthus, Allium, Avena, Hordeum, Oryzae, Setaria, Secale, Sorghum, Triticum, 
Zea, Musa, Cocos, Cydonia, Pyrus, Malus, Phoenix, Elaeis, Rubus, Fragaria, Prunus, 
Arachis, Panicum, Saccharum, Coffea, Camellia, Ananas, Vitis or Citrus), In general, 
protoplasts are produced in accordance with conventional methods {See, e.g., U.S. Pat. Nos. 
4,743,548; 4,677,066, 5,149,645; and 5,508,184; all of which are incorporated herein by 
reference). Plant tissue may be dispersed in an appropriate medium having an appropriate 
osmotic potential {e.g., 3 to 8 wt. % of a sugar polyol) and one or more polysaccharide 
hydrolases {e.g., pectinase, cellulase, etc.), and the cell wall degradation allowed to proceed 
for a sufficient time to provide protoplasts. After filtration the protoplasts may be isolated by 
centrifugation and may then be resuspended for subsequent treatment or use. Regeneration of 
protoplasts kept in culture to whole plants is performed by methods known in the art {See, 
e.g., Evans et al. Handbook of Plant Cell Culture, 1: 124-176, MacMillan Pubhshing Co., 
New York [1983]; Binding, Plant Protoplasts, p. 21-37, CRC Press, Boca Raton [1985],) and 
Potrykus and Shillito, Methods in Enzymology, Vol. 118, Plant Molecular Biology, A. and H. 
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Weissbach eds., Academic Press, Orlando [1986]). 

The present invention also contemplates the use of amphibian and insect host cell 
lines. Examples of suitable insect host cell lines include, but are not limited to, mosquito cell 
lines (e.g,, ATCC CRL-1660). Examples of suitable amphibian host cell lines include, but are 
not limited to, toad cell lines {e.g., ATCC CCL-102). 

In preferred embodiments of the present invention, the host cell cultures are prepared 
in a medium suitable for the particular cell being cultured. Commercially available media 
such as Ham's FIO (Sigma, St. Louis, MO), Minimal Essential Medium (MEM, Sigma), 
RPMI-1640 (Sigma), and Dulbecco's Modified Eagle's Medium (DMEM, Sigma) are 
exemplary nutrient solutions. Suitable media are also described in U.S. Pat. Nos. 4,767,704; 
4,657,866; 4,927,762; 5,122,469 and U.S. Pat. No. 4,560,655; and PCT Publications WO 
90/03430; and WO 87/00195 (each of which are incorporated herein by reference). Any of 
these media may be supplemented as necessary, with hormones and/or other growth factors 
(e.g., insulin, transferrin, or epidermal growth factor), salts (e.g., sodium chloride, calcium, 
magnesium, and phosphate), buffers (e.g., HEPES), nucleosides (e,g., adenosine and 
thymidine), antibiotics (e.g., gentamycin (gentamicin)), trace elements (i.e., inorganic 
compounds usually present at final concentrations in the micromolar range) lipids (e.g., 
linoleic or other fatty acids) and their suitable carriers, and glucose or an equivalent energy 
source. Any other necessary supplements may also be included at appropriate concentrations 
known to those skilled in the art. For mammalian cell culture, the osmolality of the culture 
medium is generally about 290-330 mOsm. 

The present invention also contemplates the use of a variety of culture systems (e.g., 
petri dishes, 96 well plates, roller bottles, and bioreactors) for the growth and expression of 
host cells. For example, the host cells can be cultured in a perfusion system. Perfusion 
culture refers to providing a continuous flow of culture medium through a culture maintained 
at high cell density. The cells are suspended and do not require a solid support upon which 
to grow. Generally, fresh nutrients must be supplied continuously with concomitant removal 
of toxic metabolites and, ideally, selective removal of dead cells. Filtering, entrapment and 
micro-capsulation methods are all suitable for refreshing the culture environment at sufficient 
rates. 
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In alternative embodiments, a fed batch culture procedure is employed. In the 
preferred fed batch culture method the mammalian host cells and culture medium are supplied 
to a culturing vessel initially and additional culture nutrients are fed, continuously or in 
discrete increments, to the culture during culturing, with or without periodic cell and/or 
product harvest before termination of culture. In some embodiments, the fed batch culture is 
a semi-continuous fed batch culture in which the whole culture (including cells and medium) 
is removed from the growth vessel and replaced by fresh medium. Fed batch culture is 
distinguished from simple batch culture in which all components for cell culturing (including 
the cells and all culture nutrients) are supplied to the culturing vessel at the start of the 
culturing process. Fed batch culture can be fiirther distinguished from perfusion culturing 
insofar as the supemate is not removed from the culturing vessel during the process (in 
perfiision culturing, the cells are restrained in the culture ( e.g., by filtration, encapsulation, 
anchoring to microcarriers etc.) and the culture medium is continuously or intermittently 
introduced and removed from the culturing vessel). 

Further, the cells of the culture may be propagated according to any scheme or routine 
suitable for the particular host cell and the particular production plan contemplated. 
Therefore, the present invention contemplates single step, as well as multiple step culture 
procedures. In a single step culture, the host cells are inoculated into a cultm*e environment 
and the processes of the instant invention are employed during a single production phase of 
the cell culture. In the multi-stage culture procedure, cells are cultivated in a number of steps 
or phases. For instance, cells may be grown in a first step or growth phase culture wherein 
cells, possibly removed from storage, are inoculated into a medium suitable for promoting 
growth and high viability. The cells may be maintained in the growth phase for a suitable 
period of time by the addition of fresh medium to the host cell culture. 

Fed batch or continuous cell culture conditions are contemplated in order to enhance 
growth of the mammalian cells in the growth phase of the cell culture. In the growth phase, 
cells are grown under conditions and for a period of time that is optimized for growth. 
Culture conditions, such as temperature, pH, dissolved oxygen (d02) and the like, are those 
used with the particular host and are apparent to the ordinarily skilled artisan. Generally, the 
pH is adjusted to a level between about 6.5 and 7.5 using either an acid (e.g., CO2) or a base 
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(e.g., Na2C03 or NaOH). A suitable temperature range for culturing mammalian cells (e.g., 
CHO cells) is between about 30*" to 38° C and a suitable is between 5-90% of air 
saturation. 

Following the polypeptide production phase, the polypeptide of interest is recovered 
from the culture medium using well-established techniques. Preferably, the protein of interest 
is recovered from the culture medium as a secreted polypeptide (e.g., the secretion of the 
protein of interest is directed by a signal peptide sequence), although it also may be recovered 
from host cell lysates. As a first step, the culture medium or lysate is centrifuged to remove 
particulate cell debris. The polypeptide is then purified from contaminant soluble proteins and 
polypeptides using any suitable method. Suitable purificaiton methods include, but are not 
limited to fractionation on immunoaffinity or ion-exchange columns; ethanol precipitation; 
reverse phase HPLC; chromatography on silica or on a cation-exchange resin such as DEAE; 
chromato focusing; SDS-PAGE; ammonium sulfate precipitation; gel filtration using (e.g., 
Sephadex G-75); and protein A Sepharose columns to remove contaminants such as IgG. A 
protease inhibitor such as phenyl methyl sulfonyl fluoride (PMSF) also may be usefiil to 
inhibit proteolj^ic degradation during purification. Additionally, the protein of interest can be 
fused in frame to a marker sequence which allows for purification of the protein of interest. 
Non-limiting examples of marker sequences include a hexahistidine tag which may be 
supplied by a vector, preferably a pQE-9 vector, and a hemagglutinin (HA) tag. The HA tag 
corresponds to an epitope derived from the influenza hemagglutinin protein (See e.g., Wilson 
et aL, Cell, 37:767 [1984]). One skilled in the art appreciates that purification methods 
suitable for the polypeptide of interest may require modification to account for changes in the 
character of the polypeptide upon expression in recombinant cell culture. 

B. Expression of Proteins in Animals 

In some embodiments of the present invention, the host cell utilized for expression of 
the protein of interest is part of a mammal. In preferred embodiments, the mammal is a 
transgenic bovine. The transgenic bovine may be produced by any suitable method (See e.g., 
Chan et al, PNAS, 95:14028 [1998]; U.S. Patent 5,741,957 (incorporated herein by 
reference); and Pursel et al. Science, 244:1281 [1989]). In particularly preferred 
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embodiments, the protein is expressed in the mammary gland of a bovine and secreted in the 
milk of the bovine. In embodiments where proteins are expressed in the milk of a bovine, 
proteins and signal sequences for tissue specific expression and secretion are utilized, 
including, but not limited to, bovine/human a-lactalbumin promoter and bovine a-lactalbiimin 
signal sequence. The protein of interest may be recovered from bovine milk using any 
suitable method, including but not limited to, those described above for the recovery of 
protein from cell cultures. 

Those skilled in the art recognize that the vectors of the present invention will find use 
in the production of other transgenic animals as well, including, but not limited to, mice, 
goats, pigs, birds and rabbits {See e.g., U.S Pat. Nos. 5,523,226; 5,453,457; 4,873,191; 
4,736,866; each of which is herein incorporated by reference). 

C. Expression of Antibodies 

In some embodiments of the present invention, single vectors are utilized for the 
expression of two or more proteins, including individual subunits of multisubunit proteins. In 
some embodiments, two or more chains of an immunoglobulin (e.g., one heavy chain ((y, a, 
ja, 5, or 8) and one hght chain (k or X)), separated by an IRES sequence, are expressed from 
the same vector as single transcriptional unit. The present invention is not limited to any 
particular vector. Indeed, the use of a variety of vectors is contemplated, including, but not 
limited to plasmids, cosmids, bacterial artificial chomosomes, yeast artificial chromosomes, 
adeno-associated virus vectors, and adenovirus vectors. Large numbers of suitable vectors are 
known to those of skill in the art, and are commercially available. Such vectors include, but 
are not limited to, the following vectors: 1) Bacterial - pQE70, pQE60, pQE-9 (Qiagen), 
pBS, pDlO, phagescript, psiX174, pbluescript SK, pBSKS, pNH8A, pNH16a, pNH18A, 
pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); and 2) 
Eukaryotic pWLNEO, pSV2CAT, pOG44, PXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, 
pSVL (Pharmacia). Any other plasmid or vector may be used as long as they are replicable 
and viable in the host. In some preferred embodiments of the present invention, mammalian 
expression vectors comprise an origin of replication, a suitable promoter and enhancer, and 
also any necessary ribosome binding sites, polyadenylation sites, splice donor and acceptor 
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sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences. In 
other embodiments, DNA sequences derived from the SV40 sphce, and polyadenylation sites 
may be used to provide the required non-transcribed genetic elements. 

In certain embodiments of the present invention, the DNA sequence in the expression 
vector is operatively linked to an appropriate expression control sequence(s) (promoter) to 
direct mRNA synthesis. Promoters useful in the present invention include, but are not limited 
to, the LTR or SV40 promoter, the E, coli lac or trp, the phage lambda Pl and Pr, T3 and 
T7 promoters, and the cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) 
thymidine kinase, and mouse metallothionein-I promoters and other promoters known to 
control expression of gene in prokaryotic or eukaryotic cells or their viruses. In other 
embodiments of the present invention, recombinant expression vectors include origins of 
replication and selectable markers permitting transformation of the host cell (e.g., 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or tetracycline or 
ampicillin resistance in E. coli). 

In some embodiments of the present invention, transcription of the DNA encoding the 
polypeptides of the present invention by higher eukaryotes is increased by inserting an 
enhancer sequence into the vector. Enhancers are c/^-acting elements of DNA, usually about 
from 10 to 300 bp that act on a promoter to increase its transcription. Enhancers useful in the 
present invention include, but are not limited to, the SV40 enhancer on the late side of the 
replication origin bp 100 to 270, a cytomegalovirus early promoter enhancer, the polyoma 
enhancer on the late side of the replication origin, and adenovirus enhancers. 

In other embodiments, the expression vector also contains a ribosome binding site for 
translation initiation and a transcription terminator. In still other embodiments of the present 
invention, the vector may also include appropriate sequences for amplifying expression. 

In some particularly preferred embodiments, retroviral vectors are used to express 
immunoglobulins. In some embodiments, retroviral vectors for expression of 
immunoglobulins contain regulatory elements. In some preferred embodiments of the present 
invention, two immunoglobulins chains are expressed in the same retrovirus vector construct 
separated by an IRES sequence. In some particularly preferred embodiments, the two chains 
are separated by an IRES/a-LA signal sequence. In other embodiments, the vector further 
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contains RNA export elements. In further embodiments, the RNA export element is a WPRE. 
In still other embodiments, the PPE element is at least one Mertz sequence. In some 
preferred embodiments, the PPE element is mutated to remove start signals. In other 
preferred embodiments, two PPE elements are placed in a head to tail array separated by a 
linker. 

In preferred embodiments, expression of immunoglobulins by the vectors of the 
current invention is controlled by a promoter. In some embodiments, expression is controlled 
by a CMV promoter, while in other embodiments, expression is controlled by a MMTV 
promoter. In some preferred embodiments, expression is controlled by a hybrid bovine/human 
a-LA promoter. 

In some embodiments of the present invention, heavy and light chains are expressed 
by the vectors of the current invention of a ratio of about 0.7:1.3. In preferred embodiments, 
heavy and Ught chains are expressed and a ratio of about 0.8:1.2. In particularly preferred 
embodiments, heavy and Ught chains are expressed at a ratio of about 0.9:1.1. In still more 
preferred embodiments, heavy and light chains are expressed at a ratio of about 1:1. In 
particularly preferred embodiments, the majority {e.g., greater that 90%, preferably greater 
than 95%, and most preferably greater than about 99%) of the heavy and light chains are 
correctly assembled in a ratio of 1:1 to form a functional (e.^., able to bind an antigen) 
antibody. 

In illustrative examples of the present invention, immunoglobulins are expressed in a 
host cell comprising the vectors and elements described above. In some illustrative examples 
(See e.g.. Examples 6, 8, and 12), the vectors described in Example 1 are used to express a 
variety of immimoglobulins in a variety of cell lines. In general, this expression led to the 
formation of functional, tetrameric immunoglobulins. 

D, Expression of Other Proteins 

The vectors of the present invention are also useful for expressing G-protein coupled 
receptors (GPCRs) and other transmembrane proteins. It is contemplated that when these 
proteins are expressed, they are correctly inserted into the membrane in their native 
conformation. Thus, GPCRs and other transmembrane proteins may be purified as part of a 
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membrane fraction or purified from the membranes by methods known in the art. 

Furthermore, the vectors of the present invention are useful for co-expressing a protein 
of interest for which there is no assay or for which assays are difficult. In this system, a 
protein of interest and a signal protein are arranged in a polycistronic sequence. Preferably, 
an IRES sequence separates the signal protein and protein of interest (e.g., a GPCR) and the 
genes encoding the signal protein and protein of interest are expressed as a single 
transcriptional unit. The present invention is not limited to any particular signal protein. 
Indeed, the use of a variety of signal proteins for which easy assays exist is contemplated. 
These signal proteins include, but are not limited to, green fluorescent protein, luciferase, 
beta-galactosidase, and antibody heavy or light chains. It is contemplated that when the signal 
protein and protein of interest are co-expressed from a polycistronic sequence, the presence of 
the signal protein is indicative of the presence of the protein of interest. Accordingly, in 
some embodiments, the present invention provides methods for indirectly detecting the 
expression of protein of interest comprising providing a host cell transfected with a vector 
encoding a polycistronic sequence, wherein the polycistronic sequence comprises a signal 
protein and a protein of interest operably linked by an IRES, and culturing the host cells 
under conditions such that the signal protein and protein of interest are produced, wherein the 
presence of the signal protein indicates the presence of the protein of interest. 

EXPERIMENTAL 

The following examples serve to illustrate certain preferred embodiments and aspects 
of the present invention and are not to be construed as limiting the scope thereof. 

In the experimental disclosure which follows, the following abbreviations apply: M 
(molar); mM (millimolar); (micromolar); nM (nanomolar); mol (moles); mmol 
(millimoles); |iimol (micromoles); nmol (nanomoles); gm (grams); mg (milligrams); |ig 
(micrograms) ;pg (picograms); L (liters); ml (miUiliters); ^\ (microhters); cm (centimeters); 
mm (millimeters); jxm (micrometers); nm (nanometers); °C (degrees Centigrade); AMP 
(adenosine 5 '-monophosphate); BSA (bovine serum albumin); cDNA (copy or complimentary 
DNA); CS (calf serum); DNA (deoxyribonucleic acid); ssDNA (single stranded DNA); 
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dsDNA (double stranded DNA); dNTP (deoxyribonucleotide triphosphate); LH (luteinizing 
hormone); NIH (National Institutes of Health, Besthesda, MD); RNA (ribonucleic acid); PBS 
(phosphate buffered saline); g (gravity); OD (optical density); HEPES 
(N-[2-aHydroxyethyl]piperazine-N-[2-ethanesulfonic acid]); HBS (HEPES buffered saline); 
PBS (phosphate buffered saline); SDS (sodium dodecylsulfate); Tris-HCl 
(tris[Hydroxymethyl]aminomethane-hydrochloride); Klenow (DNA polymerase I large 
(Klenow) fragment); rpm (revolutions per minute); EGTA (ethylene glycol-bis(6-aminoethyl 
ether) N, N, N\ N'-tetraacetic acid); EDTA (ethylenediaminetetracetic acid); bla (B-lactamase 
or ampicillin-resistance gene); ORI (plasmid origin of replication); lad (lac repressor); X-gal 
(5-bromo-4-chloro-3-indolyl-P-D-galactoside); ATCC (American Type Culture Collection, 
Rockville, MD); GIBCO/BRL (GIBCO/BRL, Grand Island, NY); Perkin-Elmer (Perkin-Ehner, 
Norwalk, CT); and Sigam (Sigma Chemical Company, St. Louis, MO). 

Example 1 

Vector Construction 

The following Example describes the construction of vectors used in the experiments 

below. 

A. CMV MN14 

The CMV MN14 vector (SEQ ID N0:4; MN14 antibody is described in U.S. Pat. No. 
5,874,540, incorporated herein by reference) comprises the following elements, arranged in 5' 
to 3' order: CMV promoter; MN14 heavy chain signal peptide, MN14 antibody heavy chain; 
IRES from encephalomyocarditis virus; bovine a-lactalbumin signal peptide; MN 14 antibody 
light chain; and 3' MoMuLV LTR. In addition to sequences described in SEQ ID NO: 4, the 
CMV MN14 vector further comprises a 5' MoMuLV LTR, a MoMuLV extended viral 
packaging signal, and a neomycin phosphotransferase gene (these additional elements are 
provided in SEQ ID NO:7; the 5' LTR is derived from Moloney Murine Sarcoma Virus in 
each of the constructs described herein, but is converted to the MoMuLV 5' LTR when 
integrated). 

This construct uses the 5' MoMuLV LTR to control production of the neomycin 
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phosphotransferase gene. The expression of MN14 antibody is controlled by the CMV 
promoter. The MN14 heavy chain gene and light chain gene are attached together by an 
IRES sequence. The CMV promoter drives production of a mRNA containing the heavy 
chain gene and the light chain gene attached by the IRES. Ribosomes attach to the mRNA at 
the CAP site and at the IRES sequence. This allows both heavy and light chain protein to be 
produced from a single mRNA. The mRNA expression from the LTR as well as from the 
CMV promoter is terminated and poly adenylated in the 3 ' LTR. The construct was cloned 
by similar methods as described in section B below. 

The IRES sequence (SEQ ID N0:3) comprises a fiision of the IRES from the plasmid 
pLXIN (Clontech) and the bovine a-lactalbumin signal peptide. The initial ATG of the signal 
peptide was attached to the IRES to allow the most efficient translation initiation from the 
IRES. The 3' end of the signal peptide provides a multiple cloning site allowing easy 
attachment of any protein of interest to create a fusion protein with the signal peptide. The 
IRES sequence can serve as a translational enhancer as well as creating a second translation 
initiation site that allows two proteins to be produced from a single mRNA. 

The IRES-bovine a-lactalbumin signal peptide was constructed as follows. The 
portion of the plasmid pLXIN (Clontech, Palo Alto, CA) containing the ECMV IRES was 
PCR amplified using the following primers. 

Primer 1 (SEQ ID NO: 35): 

5' GATCCACTAGTAACGGCCGCCAGAATTCGC 3' 
Primer 2 (SEQ ID NO: 36): 

5' CAGAGAGACAAAGGAGGCCATATTATCATCGTGTTTTTCAAAG 3' 

Primer 2 attaches a tail corresponding to the start of the bovine a-lactalbumin signal 
peptide coding region to the IRES sequence. In addition, the second triplet codon of the 
a-lactalbumin signal peptide was mutated from ATG to GCC to allow efficient translation 
from the IRES sequence. This mutation results in a methionine to alanine change in the 
protein sequence. This mutation was performed because the IRES prefers an alanine as the 
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second amino acid in the protein chain. The resulting IRES PGR product contains an EcoRI 
site on the 5' end of the fragment (just downstream of Primer 1 above). 

Next, the a-lactalbumin signal peptide containing sequence was PGR amplified from 
the a-LA Signal Peptide vector construct using the following primers. 

Primer 3 (SEQ ID NO: 14): 

5' GTTTGAAAAAGAGGATGATAATATGGGGTGGTTTGTGTGTGTG 3' 
Primer 4 (SEQ ID NO: 15): 

5' TTCGCGAGCTCGAGATCTAGATATCCCATG 3' 

Primer 3 attaches a tail corresponding to the 3' end of the IRES sequence to the 
a-lactalbumin signal peptide coding region. As stated above, the second triplet codon 
of the bovine a-lactalbumin signal peptide was mutated to allow efficient translation from the 
IRES sequence. The resulting signal peptide PGR fragment contains Nael, Ncol, EcoRV, 
Xbal, Bglll and Xhol sites on the 3' end. 

After the IRES and signal peptide were amplified individually using the primers shown 
above, the two reaction products were mixed and PGR was performed using primer 1 and 
primer 4. The resultant product of this reaction is a spliced fragment that contains the IRES 
attached to the full length a-lactalbumin signal peptide. The ATG encoding the start of the 
signal peptide is placed at the same location as the ATG encoding the start of the neomycin 
phosphotransferase gene found in the vector pLXIN. The fragment also contains the EcoRI 
site on the 5' end and Nael, Ncol, EcoRV, Xbal, Bglll and Xhol sites on the 3' end. 

The spliced IRES/a-lactalbumin signal peptide PGR fragment was digested with EcoRI 
and Xhol. The a-LA Signal Peptide vector construct was also digested with EcoRI and Xhol. 
These two fragments were ligated together to give the pIRES construct. 

The IRES/a-lactalbumin signal peptide portion of the pIRES vector was sequenced and 
found to contain mutations in the 5' end of the IRES. These mutations occur in a long 
stretch of G's and were found in all clones that were isolated. 

To repair this problem, pLXIN DNA was digested with EcoRI and BsmFI. The 500bp 
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band corresponding to a portion of the IRES sequence was isolated. The mutated 
IRES/a-lactalbumin signal peptide construct was also digested with EcoRI and BsmFI and the 
mutated IRES fragment was removed. The IRES fragment from pLXIN was then substituted 
for the IRES fragment of the mutated IRES/a-lactalbumin signal peptide construct. The 
5 IRES/a-LA signal peptide portion of resulting plasmid was then verified by DNA sequencing. 

The resulting construct was found to have a number of sequence differences when 
compared to the expected pLXIN sequence obtained from Clontech. We also sequenced the 
IRES portion of pLXIN purchased from Clontech to verify its sequence. The differences 
from the expected sequence also appear to be present in the pLXIN plasmid that we obtained 

10_ from Clontech. Four sequence differences were identified: 

O 

=fi bp 347 T - was G in pLXIN sequence 

2 bp 786-788 ACG - was GC in LXIN sequence. 

O B. CMVLL2 

III 

14' The CMV LL2 (SEQ ID N0:5; LL2 antibody is described in U.S. Pat. No. 6,187,287, 

y incorporated herein by reference) construct comprises the following elements, arranged in 5' 

y ^ 

ry to 3' order: 5' CMV promoter (Clontech), LL2 heavy chain signal peptide, LL2 antibody 
Q heavy chain; IRES from encephalomyocarditis virus; bovine a-LA signal peptide; LL2 
-"^ antibody light chain; and 3' MoMuLV LTR. In addition to sequences described in SEQ ID 
20 N0:5, the CMV LL2 vector fiirther comprises a 5' MoMuLV LTR, a MoMuLV extended 

viral packaging signal, and a neomycin phosphotransferase gene (these additional elements are 
provided in SEQ ID N0:7). 

This construct uses the 5' MoMuLV LTR to control production of the neomycin 
phosphotransferase gene. The expression of LL2 antibody is controlled by the CMV promoter 
25 (Clontech). The LL2 heavy chain gene and light chain gene are attached together by an IRES 
"sequence. The CMV promoter drives production of a mRNA containing the heavy chain gene 
and the light chain gene attached by the IRES. Ribosomes attach to the mRNA at the CAP 
site and at the IRES sequence. This allows both heavy and light chain protein to be produced 
from a single mRNA. The mRNA expression from the LTR as well as from the CMV 
30 promoter is terminated and poly adenylated in the 3' LTR. 
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The IRES sequence (SEQ ID N0:3) comprises a fusion of the IRES from the plasmid 
pLXIN (Clontech) and the bovine alpha-lactalbumin signal peptide. The initial ATG of the 
signal peptide was attached to the IRES to allow the most efficient translation initiation from 
the IRES. The 3' end of the signal peptide provides a multiple cloning site allowing easy 
attachment of any protein of interest to create a fusion protein with the signal peptide. The 
IRES sequence can serve as a translational enhancer as well as creating a second translation 
initiation site that allows two proteins to be produced from a single mRNA. 

The LL2 light chain gene was attached to the IRES a-lactalbumin signal peptide as 
follows. The LL2 light chain was PGR amplified from the vector pCRLL2 using the 
following primers. 

Primer 1 (SEQ ID NO: 16): 

5' CTACAGGTGTCCACGTCGACATCCAGCTGACCCAG 3' 
Primer 2 (SEQ ID NO: 17): 

5' CTGCAGAATAGATCTCTAACACTCTCCCCTGTTG 3' 

These primers add a Hindi site right at the start of the coding region for mature LL2 
light chain. Digestion of the PGR product with Hindi gives a blunt end fragment starting 
with the initial GAG encoding mature LL2 on the 5' end. Primer 2 adds a Bglll site to the 3' 
end of the gene right after the stop codon. The resulting PGR product was digested with 
Hindi and Bglll and cloned directly into the IRES-Signal Peptide plasmid that was digested 
with Nad and Bglll. 

The Kozak sequence of the LL2 heavy chain gene was then modified. The vector 
pCRMNMHC was digested with Xhol and Avrll to remove about a 400 bp fragment. PGR 
was then used to amplify the same portion of the LL2 heavy chain construct that was 
removed by the Xhol-Avrll digestion. This amplification also mutated the 5' end of the gene 
to add a better Kozak sequence to the clone. The Kozak sequence was modified to resemble 
the typical IgG Kozak sequence. The PGR primers are shown below. 



Primer 1 (SEQ ID NO: 18): 
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5 'CAGTGTGATCTCGAGAATTCAGGACCTCACCATGGGATGGAGCTGTATCAT 3 ' 



Primer 2 (SEQ ID NO: 19): 
5'AGGCTGTATTGGTGGATTCGTCT 3' 

The PGR product was digested with Xhol and Avrll and inserted back into the 
previously digested plasmid backbone. 

The "good" Kozak sequence was then added to the light chain gene. The "good" 
Kozak LL2 heavy chain gene construct was digested with EcoRI and the heavy chain gene 
containing fragment was isolated. The IRES a-Lactalbumin Signal Peptide LL2 light chain 
gene construct was also digested with EcoRI. The heavy chain gene was then cloned into the 
EcoRI site of IRES light chain construct. This resulted in the heavy chain gene being placed 
at the 5' end of the IRES sequence. 

Next, a multiple cloning site was added into the LNCX retroviral backbone plasmid. 
The LNCX plasmid was digested with Hindlll and Clal. Two oligonucleotide primers were 
produced and annealed together to create an double stranded DNA multiple cloning site. The 
following primers were aimealed together. 

Primer 1 (SEQ ID NO: 20): 

5 ' AGCTTCTCGAGTT AACAGATCTAGGCCTCCTAGGTCGACAT 3 ' 
Primer 2 (SEQ ID NO: 21): 5' 

CGATGTCGACCTAGGAGGCCTAGATCTGTTAACTCGAGA 3 ' 

After annealing, the multiple cloning site was ligated into LNCX to create LNC-MCS. 

Next, the double chain gene fragment was ligated into the retroviral backbone gene 
construct. The double chain gene construct created above was digested with Sail and Bglll 
and the double chain containing fragment was isolated. The retroviral expression plasmid 
LNC-MCS was digested with Xhol and Bglll. The double chain fragment was then cloned 
into the LNC-MCS retroviral expression backbone. 

Next, an RNA splicing problem in the construct was corrected. The construct was 
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digested with NsiL The resulting fragment was then partially digested with EcoRI. The 
fragments resulting from the partial digest that were approximately 9300 base pairs in size 
were gel purified. A linker was created to mutate the splice donor site at the 3' end of the 
LL2 heavy chain gene. The linker was again created by annealing two oligonucleotide 
primers together to form the double stranded DNA linker. The two primers used to create the 
linker are shown below. 

Primer 1 (SEQ ID NO: 22): 

5'CGAGGCTCTGCACAACCACTACACGCAGAAGAGCCTCTCCCTGTCTCCCGGGAAAT 
GAAAGCCG 3' 

Primer 2 (SEQ ID NO: 23): 

5'AATTCGGCTTTCATTTCCCGGGAGACAGGGAGAGGCTCTTCTGCGTGTAGTGGTTG 
TGCAGAGCCTCGTGCA 3' 

After annealing the linker was substituted for the original Nsil/EcoRI fragment that was 
removed during the partial digestion. 

C. MMTV MN14 

The MMTV MN14 (SEQ ID N0:6) construct comprises the following elements, 
arranged in 5' to 3' order: 5' MMTV promoter; double mutated PPE sequence; MN 14 
antibody heavy chain; IRES from encephalomyocarditis virus; bovine aLA signal peptide MN 
14 antibody light chain; WPRE sequence; and 3' MoMuLV LTR. In addition to the 
sequences described in SEQ ID N0:6, the MMTV MN14 vector further comprises a 
MoMuLV LTR, MoMuLV extended viral packaging signal; neomycin phosphotransferase 
gene located 5' of the MMTV promoter (these additional elements are provided in SEQ ID 
NO: 7). 

This construct uses the 5' MoMuLV LTR to control production of the neomycin 
phosphotransferase gene. The expression of MN14 antibody is controlled by the MMTV 
promoter (Pharmacia). The MN14 heavy chain gene and light chain gene are attached 
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together by an IRES/ bovine a-LA signal peptide sequence (SEQ ID NO: 3). The MMTV 
promoter drives production of a mRNA containing the heavy chain gene and the light chain 
gene attached by the IRES/bovine a-LA signal peptide sequence. Ribosomes attach to the 
mRNA at the CAP site and at the IRES/ bovine a-LA signal peptide sequence. This allows 
both heavy and light chain protein to be produced from a single mRNA. In addition, there 
are two genetic elements contained within the mRNA to aid in export of the mRNA from the 
nucleus to the cytoplasm and aid in poly-adenylation of the mRNA. The PPE sequence is 
contained between the RNA CAP site and the start of the MN14 protein coding region, the 
WPRE is contained between the end of MN14 protein coding and the poly-adenylation site. 
The mRNA expression from the LTR as well as from the MMTV promoter is terminated and 
poly-adenylated in the 3' LTR. 

ATG sequences within the PPE element (SEQ ID N0:2) were mutated to prevent 
potential unwanted translation initiation. Two copies of this mutated sequence were used in a 
head to tail array. This sequence is placed just downstream of the promoter and upstream of 
the Kozak sequence and signal peptide-coding region. The WPRE is isolated from 
woodchuck hepatitis virus and also aids in the export of mRNA from the nucleus and creating 
stability in the mRNA. If this sequence is included in the 3' untranslated region of the RNA, 
level of protein expression from this RNA increases up to 10-fold. 

D. of-LA MN14 

The a-LA MN14 (SEQ ID N0:7) construct comprises the following elements, 
arranged in 5' to 3' order: 5' MoMuLV LTR, MoMuLV extended viral packaging signal, 
neomycin phosphotransferase gene, bovine/human alpha-lactalbumin hybrid promoter, double 
mutated PPE element, MN14 heavy chain signal peptide, MN14 antibody heavy chain, IRES 
from encephalomyocarditis virus/bovine aLA signal peptide, MN14 antibody light chain, 
WPRE sequence; and 3' MoMuLV LTR. 

This construct uses the 5' MoMuLV LTR to control production of the neomycin 
phosphotransferase gene. The expression of MN14 antibody is controlled by the hybrid a-LA 
promoter (SEQ ID N0:1). The MN14 heavy chain gene and light chain gene are attached 
together by an IRES sequence/ bovine a-LA signal peptide (SEQ ID NO: 3). The a-LA 
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promoter drives production of a mRNA containing the heavy chain gene and the light chain 
gene attached by the IRES. Ribosomes attach to the mRNA at the CAP site and at the IRES 
sequence. This allows both heavy and light chain protein to be produced from a single 
mRNA. 

In addition, there are two genetic elements contained within the mRNA to aid in 
export of the mRNA from the nucleus to the cytoplasm and aid in poly-adenylation of the 
mRNA. The mutated PPE sequence (SEQ ID N0:2) is contained between the RNA CAP site 
and the start of the MN14 protein coding region. ATG sequences within the PPE element 
(SEQ ID N0:2) were mutated to prevent potential unwanted translation initiation. Two 
copies of this mutated sequence were used in a head to tail array. This sequence is placed 
just downstream of the promoter and upstream of the Kozak sequence and signal 
peptide-coding region. The WPRE was isolated from woodchuck hepatitis virus and also aids 
in the export of mRNA from the nucleus and creating stability in the mRNA. If this sequence 
is included in the 3' untranslated region of the RNA, level of protein expression from this 
RNA increases up to 10-fold. The WPRE is contained between the end of MN14 protein 
coding and the poly-adenylation site. The mRNA expression from the LTR as well as from 
the bovine/human alpha-lactalbumin hybrid promoter is terminated and poly adenylated in the 
3' LTR. 

The bovine/human alpha-lactalbumin hybrid promoter (SEQ ID N0:1) is a modular 
promoter /enhancer element derived from human and bovine alpha-lactalbumin promoter 
sequences. The human portion of the promoter is from +15 relative to transcription start 
point (tsp) to -600 relative to the tsp. The bovine portion is then attached to the end of the 
human portion and corresponds to -550 to -2000 relative to the tsp. The hybrid was 
developed to remove poly-adenylation signals that were present in the bovine promoter and 
hinder retroviral RNA production. It was also developed to contain genetic control elements 
that are present in the human gene, but not the bovine. 

For construction of the bovine/human a-lactalbumin promoter, human genomic DNA 
was isolated and purified. A portion of the human a-lactalbumin promoter was PCR 
amplified using the following two primers: 
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Primer 1 (SEQ ID NO: 24): 

5'AAAGCATATGTTCTGGGCCTTGTTACATGGCTGGATTGGTT 3' 
Primer 2 (SEQ ID NO: 25): 

5'TGAATTCGGCGCCCCCAAGAACCTGAAATGGAAGCATCACTCAGTTT 
5 CAT AT AT 3' 

This two primers created a Ndel site on the 5' end of the PGR fragment and a EcoRI 
site on the 3' end of the PGR fragment. 

The human PGR fragment created using the above primers was double digested with 
10 the restriction enzymes Ndel and EcoRI. The plasmid pKBaP-1 was also double digested 
with Ndel and EcoRI. The plasmid pKBaP-1 contains the bovine a-lactalbumin 5' flanking 
region attached to a multiple cloning site. This plasmid allows attachment of various genes to 
\j the bovine a-lactalbumin promoter, 
y Subsequently, the human fragment was ligated/substituted for the bovine fragment of 

ItsJ 

iW^ the promoter that was removed from the pKBaP-1 plasmid during the double digestion. The 
Q resulting plasmid was confirmed by DNA sequencing to be a hybrid of the Bovine and 
% I Human a-lactalbumin promoter/regulatory regions, 

Attachment of the MN14 light chain gene to the IRES a-lactalbumin signal 
peptide was accomplished as follows. The MN14 light chain was PGR ampUfied from the 
20 vector pCRMN14LC using the following primers. 

Primer 1 (SEQ ID NO: 26): 5' CTACAGGTGTCCACGTCGACATCCAGCTGACCCAG 3' 
Primer 2 (SEQ ID NO: 27): 5' CTGCAGAATAGATCTCTAACACTCTCCCCTGTTG 3' 

25 These primers add a Hindi site right at the start of the coding region for mature 

MN14 light chain. Digestion of the PGR product with Hindi gives a blunt end fragment 
starting with the initial GAG encoding mature MN14 on the 5' end. Primer 2 adds a Bglll 
site to the 3' end of the gene right after the stop codon. The resulting PGR product was 
digested with Hindi and Bglll and cloned directly into the IRES-Signal Peptide plasmid that 

30 was digested with Nad and Bglll. 
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Next, the vector pCRMN14HC was digested with Xhol and Nrul to remove about a 
500 bp fragment. PGR was then used to amplify the same portion of the MN14 heavy chain 
construct that was removed by the Xhol-Nrul digestion. This amphfication also mutated the 
5' end of the gene to add a better Kozak sequence to the clone. The Kozak sequence was 
modified to resemble the typical IgG Kozak sequence. The PGR primers are shown below. 

Primer 1 (SEQ ID NO: 28): 

5'GAGTGTGATGTGGAGAATTGAGGAGGTGAGCATGGGATGGAGGTGTATCAT 3' 

Primer 2 (SEQ ID NO: 29): 
5'GTGTCTTCGGGTCTCAGGCTGT 3' 

The PGR product was digested with Xhol and Nrul and inserted back into the 
previously digested plasmid backbone. 

Next, the "good" Kozak MN14 heavy chain gene construct was digested with EcoRI 
and the heavy chain gene containing fragment was isolated. The IRES a-Lactalbumin Signal 
Peptide MN14 light chain gene construct was also digested with EcoRI. The heavy chain 
gene was then cloned into the EcoRI site of IRES light chain construct. This resulted in the 
heavy chain gene being placed at the 5' end of the IRES sequence. 

A multiple cloning site was then added to the LNCX retroviral backbone plasmid. 
The LNCX plasmid was digested with Hindlll and Clal. Two oligonucleotide primers were 
produced and aimealed together to create an double stranded DNA multiple cloning site. The 
following primers were annealed together. 

Primer 1 (SEQ ID NO: 30): 

5' AGCTTCTCGAGTTAACAGATGTAGGCCTCCTAGGTGGAGAT 3' 
Primer 2 (SEQ ID NO: 31): 

5' CGATGTCGACCTAGGAGGCGTAGATCTGTTAACTCGAGA 3' 
After annealing the multiple cloning site was ligated into LNCX to create LNC-MCS. 
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The double chain gene fragment was then inserted into a retroviral backbone gene 
construct. The double chain gene construct created in step 3 was digested with Sail and Bglll 
and the double chain containing fragment was isolated. The retroviral expression plasmid 
LNC-MCS was digested with Xhol and Bglll. The double chain fragment was then cloned 
into the LNC-MCS retroviral expression backbone. 

Next, a RNA splicing problem in the construct was repaired. The construct was 
digested with NsiL The resulting fragment was then partially digested with EcoRI. The 
fragments resulting from the partial digest that were approximately 9300 base pairs in size, 
were gel purified. A linker was created to mutate the spUce donor site at the 3' end of the 
MN14 heavy chain gene. The linker was again created by annealing two oligonucleotide 
primers together to form the double stranded DNA linker. The two primers used to create the 
linker are shown below. 

Primer 1 (SEQ ID NO: 32): 

5'CGAGGCTCTGCACAACCACTACACGCAGAAGAGCCTCTCCCTGTCTCCCGGGAAAT 
GAAAGCCG 3' 

Primer 2 (SEQ ID NO: 33): 

5'AATTCGGCTTTCATTTCCCGGGAGACAGGGAGAGGCTCTTCTGCGTGTAGTGGTTG 
TGCAGAGCCTCGTGCA V 

After annealing the linker was substituted for the original Nsil/EcoRI fragment that 
was removed during the partial digestion. 

Next, the mutated double chain fragment was inserted into the a-Lactalbumin 
expression retroviral backbone LN a-LA-Mertz-MCS. The gene construct produced above 
was digested with BamHI and Bglll and the mutated double chain gene containing fragment 
was isolated. The LN a-LA-Mertz-MCS retroviral backbone plasmid was digested with 
Bglll. The BamHI/Bglll fragment was then inserted into the retroviral backbone plasmid. 

A WPRE element was then inserted into the gene construct. The plasmid Bluescriptll 
SK+ WPRE-Bll was digested with BamHI and Hindi to remove the WPRE element and the 
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element was isolated. The vector created above was digested with Bglll and Hpal. The 
WPRE fragment was ligated into the Bglll and Hpal sites to create the final gene construct. 



E. of-LA Bot 

5 The a-LA Bot (SEQ ID NO: 8, botulinum toxin antibody) construct comprises the 

following elements, arranged in 5' to 3' order: bovine/human alpha-lactalbumin hybrid 
promoter, mutated PPE element, cc49 signal peptide, botulinum toxin antibody light chain, 
IRES from encephalomyocarditis virus/ bovine a-LA signal peptide, botulinum toxin antibody 
heavy chain, WPRE sequence, and 3' MoMuLV LTR. In addition, the a-LA botulinum toxin 
10 antibody vector further comprises a 5' MoMuLV LTR, a MoMuLV extended viral packaging 

^ signal, and a neomycin phosphotransferase gene (these additional elements are provided in 

m SEQ ID NO: 7). 

Q This construct uses the 5' MoMuLV LTR to control production of the neomycin 

™ phosphotransferase gene. The expression of botulinum toxin antibody is controlled by the 
iSP hybrid a-LA promoter. The botulinum toxin antibody light chain gene and heavy chain gene 

s 

P are attached together by an IRES/ bovine a-LA signal peptide sequence. The bovine/human 
alpha-lactalbumin hybrid promoter drives production of a mRNA containing the light chain 
gene and the heavy chain gene attached by the IRES. Ribosomes attach to the mRNA at the 
i2 CAP site and at the IRES sequence. This allows both light and heavy chain protein to be 

20 produced from a single mRNA. 

In addition, there are two genetic elements contained within the mRNA to aid in 
export of the mRNA from the nucleus to the cytoplasm and aid in poly-adenylation of the 
mRNA. The mutated PPE sequence (SEQ ID N0:2) is contained between the RNA CAP site 
and the start of the MN14 protein coding region. ATG sequences within the PPE element 

25 (SEQ ID N0:2) were mutated to prevent potential unwanted translation initiation. Two 

copies of this mutated sequence were used in a head to tail array. This sequence was placed 
just downstream of the promoter and upstream of the Kozak sequence and signal 
peptide-coding region. The WPRE was isolated from woodchuck hepatitis virus and also aids 
in the export of mRNA from the nucleus and creating stability in the mRNA. If this sequence 

30 is included in the 3' untranslated region of the RNA, level of protein expression from this 
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RNA increases up to 10-fold. The WPRE is contained between the end of MN14 protein 
coding and the poly-adenylation site. The mRNA expression from the LTR as well as from 
the bovine/human alpha-lactalbumin hybrid promoter is terminated and poly adenylated in the 
3' LTR. 

The bovine/human a-lactalbumin hybrid promoter (SEQ ID N0:1) is a modular 
promoter/enhancer element derived from human and bovine a-lactalbumin promoter 
sequences. The human portion of the promoter is from +15 relative to transcription start 
point to -600 relative to the tsp. The bovine portion is then attached to the end of the human 
portion and corresponds to -550 to -2000 relative to the tsp. The hybrid was developed to 
remove poly-adenylation signals that were present in the bovine promoter and hinder 
retroviral RNA production. It was also developed to contain genetic control elements that are 
present in the human gene, but not the bovine. Likewise, the construct contains control 
elements present in the bovine but not in the human. 

F. LSRNL 

The LSRNL (SEQ ID N0:9) construct comprises the following elements, arranged in 
5' to 3' order: 5' MoMuLV LTR, MoMuLV viral packaging signal; hepatitis B surface 
antigen; RSV promoter; neomycin phosphotransferase gene; and 3' MoMuLV LTR. 

This construct uses the 5 ' MoMuLV LTR to control production of the Hepatitis B 
surface antigen gene. The expression of the neomycin phosphotransferase gene is controlled 
by the RSV promoter. The mRNA expression from the LTR as well as from the RSV 
promoter is terminated and poly adenylated in the 3' LTR. 

G. a-LA CC49IL2 

The a-LA cc49IL2 (SEQ ID NO: 10; the cc49 antibody is described in U.S. Pat. Nos. 
5,512,443; 5,993,813; and 5,892,019; each of which is herein incorporated by reference) 
construct comprises the following elements, arranged in 5' to 3' order: 5' bovine/human a- 
lactalbumin hybrid promoter; cc49-IL2 coding region; and 3' MoMuLV LTR. This gene 
construct expresses a fusion protein of the single chain antibody cc49 attached to Interleukin- 
2. Expression of the fusion protein is controlled by the bovine/human a-lactalbumin hybrid 
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promoter. 

The bovine/human a-lactalbumin hybrid promoter (SEQ ID N0:1) is a modular 
promoter/enhancer element derived from human and bovine alpha-lactalbumin promoter 
sequences. The human portion of the promoter is from +15 relative to transcription start 
point to -600 relative to the tsp. The bovine portion is then attached to the end of the human 
portion and corresponds to -550 to -2000 relative to the tsp. The hybrid was developed to 
remove poly-adenylation signals that were present in the bovine promoter and hinder 
retroviral RNA production. It was also developed to contain genetic control elements that are 
present in the human gene, but not the bovine. Likewise, the construct contains control 
elements present in the bovine but not in the human. The 3' viral LTR provide the poly- 
adenylation sequence for the mRNA. 

H. a-LA YP 

The a-LA YP (SEQ ID NO: 11) construct comprises the following elements, arranged 
in 5' to 3' order: 5' bovine/human alpha-lactalbumin hybrid promoter; double mutated PPE 
sequence; bovine aLA signal peptide; Yersenia pestis antibody heavy chain Fab coding 
region; EMCV IRES/ bovine a-LA signal peptide; Yersenia pestis antibody light chain Fab 
coding region; WPRE sequence; 3' MoMuLV LTR. 

This gene construct will cause the expression of Yersenia pestis mouse Fab antibody. 
The expression of the gene construct is controlled by the bovine/human a-lactalbumin hybrid 
promoter. The PPE sequence and the WPRE sequence aid in moving the mRNA from the 
nucleus to the cytoplasm. The IRES sequence allows both the heavy and the light chain 
genes to be translated from the same mRNA. The 3' viral LTR provides the poly-adenylation 
sequence for the mRNA. 

In addition, there are two genetic elements contained within the mRNA to aid in 
export of the mRNA from the nucleus to the cytoplasm and aid in poly-adenylation of the 
mRNA. The mutated PPE sequence (SEQ ID N0:2) is contained between the RNA CAP site 
and the start of the MN14 protein coding region. ATG sequences within the PPE element 
(SEQ ID N0:2) were mutated (bases 4, 112, 131, and 238 of SEQ ID NO: 2 were changed 
from a G to a T) to prevent potential unwanted translation initiation. Two copies of this 
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mutated sequence were used in a head to tail array. This sequence was placed just 
downstream of the promoter and upstream of the Kozak sequence and signal peptide-coding 
region. The WPRE was isolated from woodchuck hepatitis virus and also aids in the export 
of mRNA from the nucleus and creating stability in the mRNA. If this sequence is included 
in the 3' untranslated region of the RNA, level of protein expression from this RNA increases 
up to 10-fold. The WPRE is contained between the end of MN14 protein coding and the 
poly-adenylation site. The mRNA expression from the LTR as well as from the 
bovine/human alpha-lactalbumin hybrid promoter is terminated and poly adenylated in the 3' 
LTR. 

The bovine/human alpha-lactalbumin hybrid promoter (SEQ ID NO:l) is a modular 
promoter /enhancer element derived from human and bovine alpha-lactalbumin promoter 
sequences. The human portion of the promoter is from +15 relative to transcription start 
point to -600 relative to the tsp. The bovine portion is then attached to the end of the human 
portion and corresponds to -550 to -2000 relative to the tsp. The hybrid was developed to 
remove poly-adenylation signals that were present in the bovine promoter and hinder 
retroviral RNA production. It was also developed to contain genetic control elements that are 
present in the human gene, but not the bovine. Likewise, the construct contains control 
elements present in the bovine but not in the human. 

Example 2 

Generation of Cell Lines Stably Expressing the MoMLV gag and pol Proteins 

Examples 2-5 describe the production of pseudotyped retroviral vectors. These 
methods are generally applicable to the production of the vectors described above. The 
expression of the fusogenic VSV G protein on the surface of cells results in syncytium 
formation and cell death. Therefore, in order to produce retroviral particles containing the 
VSV G protein as the membrane-associated protein a two-step approach was taken. First, 
stable cell lines expressing the gag and pol proteins from MoMLV at high levels were 
generated (e.g., 293GP^^ cells). The stable cell line which expresses the gag and pol proteins 
produces noninfectious viral particles lacking a membrane-associated protein (e.g., an 
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envelope protein). The stable cell line was then co-transfected, using the calcium phosphate 
precipitation, with VSV-G and gene of interest plasmid DNAs. The pseudotyped vector 
generated was used to infect 293GP^^ cells to produce stably transformed cell lines. Stable 
cell lines can be transiently transfected with a plasmid capable of directing the high level 
expression of the VSV G protein (see below). The transiently transfected cells produce VSV 
G-pseudotyped retroviral vectors which can be collected from the cells over a period of 3 to 4 
days before the producing cells die as a result of syncytium formation. 

The first step in the production of VSV G-pseudotyped retroviral vectors, the 
generation of stable cell lines expressing the MoMLV gag and pol proteins is described 
below. The human adenovirus Ad-5-transformed embryonal kidney cell line 293 (ATCC CRL 
1573) was cotransfected with the pCMVgag-pol and the gene encoding for phleomycin. 
pCMV gag-pol contains the MoMLV gag and pol genes under the control of the CMV 
promoter (pCMV gag-pol is available from the ATCC). 

The plasmid DNA was introduced into the 293 cells using calcium phosphate 
co-precipitation (Graham and Van der Eb, Virol. 52:456 [1973]). Approximately 5 x 10^ 293 
cells were plated into a 100 mm tissue culture plate the day before the DNA co-precipitate 
was added. Stable transformants were selected by growth in DMEM-high glucose medium 
containing 10% PCS and 10 )ag/ml phleomycin (selective medium). Colonies which grew in 
the selective medium were screened for extracellular reverse transcriptase activity (Goff et al, 
J. Virol. 38:239 [1981]) and intracellular p30gag expression. The presence of p30gag 
expression was determined by Western blotting using a goat-anti p30 antibody (NCI antiserum 
77S000087). A clone which exhibited stable expression of the retroviral genes was selected. 
This clone was named 293GP^^ (293 gag-pol-San Diego). The 293GP^^ cell line, a derivative 
of the human Ad-5-transfonned embryonal kidney cell line 293, was grovm in DMEM-high 
glucose medium containing 10% PCS. 

Example 3 

Preparation of Pseudotyped Retroviral Vectors Bearing the G Glycoprotein of VSV 
In order to produce VSV G protein pseudotyped retrovirus the following steps were 
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taken. The 293GP^^ cell line was co-transfected with VSV-G plasmid and DNA plasmid of 
interest. This co-transfection generates the infectious particles used to infect 293 GP^^ cells to 
generate the packaging cell lines. This Example describes the production of pseudotyped 
LNBOTDC virus. This general method may Be used to produce any of the vectors described 
in Example 1 . 

a) Cell Lines and Plasmids 

The packaging cell line, 293 GP^^ was grown in alpha-MEM-high glucose medium 
containing 10% PCS The titer of the pseudo-typed virus may be determined using either 
208F cells (Quade, Virol. 98:461 [1979]) or NIH/3T3 cells (ATCC CRL 1658); 208F and 
NIH/3T3 cells are grown in DMEM-high glucose medium containing 10% CS. 

The plasmid LNBOTDC contains the gene encoding BOTD under the transcriptional 
control of cytomegalovirus intermediate-early promoter followed by the gene encoding 
neomycin phosphotransferase (Neo) under the transcriptional control of the LTR promoter. 
The plasmid pHCMV-G contains the VSV G gene under the transcriptional control of the 
human cytomegalovirus intermediate-early promoter (Yee et al, Meth. Cell Biol. 43:99 
[1994]). 

b) Production of stable packaging cell lines, pseudotyped vector and Titering 
of Pseudotyped LNBOTDC Vector 

LNBOTDC DNA (SEQ ID NO: 13) was co-transfected with pHCMV-G DNA into the 
packaging line 293GP^^ to produce LNBOTDC virus. The resulting LNBOTDC virus was 
then used to infect 293GP^° cells to transform the cells. The procedure for producing 
pseudotyped LNBOTDC virus was carried out as described (Yee et al, Meth. Cell Biol. 43:99 
[1994]. 

This is a retroviral gene construct that upon creation of infectious replication defective 
retroviral vector will cause the insertion of the sequence described above into the cells of 
interest. Upon insertion the CMV regulatory sequences control the expression of the 
botulinum toxin antibody heavy and light chain genes. The IRES sequence allows both the 
heavy and the light chain genes to be translated from the same mRNA. The 3' viral LTR 
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provides the poly-adenylation sequence for the mRNA. 

Both heavy and light chain protein for botulinum toxin antibody are produced from 
this signal mRNA. The two proteins associated to form active botulinum toxin antibody. The 
heavy and light chain proteins also appear to be formed in an equal molar ratio to each other. 

Briefly, on day 1, approximately 5 x 10"* 293GP^^ cells were placed in a 75 cm^ tissue 
culture flask. On the following day (day 2), the 293GP^^ cells were transfected with 25 |ig of 
pLNBOTDC plasmid DNA and 25 ^ig of VSV-G plasmid DNA using the standard calcium 
phosphate co-precipitation procedure (Graham and Van der Eb, Virol. 52:456 [1973]). A 
range of 10 to 40 jxg of plasmid DNA may be used. Because 293GP^^ cells may take more 
than 24 hours to attach firmly to tissue culture plates, the 293 GP^^ cells may be placed in 75 
cm^ flasks 48 hours prior to transfection. The transfected 293GP^^ cells provide pseudotyped 
LNBOTDC virus. 

On day 3, approximately 1x10^ 293GP^^ cells were placed in a 75 cm^ tissue culture 
flask 24 hours prior to the harvest of the pseudotyped virus from the transfected 293 GP^^ 
cells. On day 4, culture medium was harvested from the transfected 293GP^^ cells 48 hours 
after the application of the pLNBOTDC and VSV-G DNA. The culture medium was filtered 
through a 0.45 |am filter and polybrene was added to a final concentration of 8 |ig/mL The 
culture medium containing LNBOTDC virus was used to infect the 293GP^° cells as follows. 
The culture medium was removed from the 293GP^° cells and was replaced with the 
LNBOTDC virus containing culture medium. Polybrene was added to the medium following 
addition to cells. The virus containing medium was allowed to remain on the 293GP^° cells 
for 24 hours. Following the 16 hour infection period (on day 5), the medium was removed 
from the 293GP^^ cells and was replaced with fresh medium containing 400 |ag/ml G418 
(GIBCO/BRL). The medium was changed approximately every 3 days until G418-resistant 
colonies appeared approximately two weeks later. 

The G4 18 -resistant 293 colonies were plated as single cells in 96 wells. Sixty to one 
hundred G418-resistant colonies were screened for the expression of the BOTDC antibody in 
order to identify high producing clones. The top 10 clones in 96-well plates were transferred 
6-well plates and allowed to grow to confluency. 

The top 10 clones were then expanded to screen for high titer production. Based on 
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protein expression and titer production, 5 clonal cell lines were selected. One line was 
designated the master cell bank and the other 4 as backup cell lines. Pseudotyped vector was 
generated as follows. Approximately 1 x 10^ 293GP^^/LNBOTDC cells were placed into a 
75cm^ tissue culture flask. Twenty-four hours later, the cells were transfected with 25 |ag of 
pHCMV-G plasmid DNA using calcium phosphate co-precipitation. Six to eight hours after 
the calcium-DNA precipitate was applied to the cells, the DNA solution was replaced with 
fresh culture medium (lacking G418). Longer transfection times (ovemight) were found to 
result in the detachment of the majority of the 293GP^^/LNBOTDC cells from the plate and 
are therefore avoided. The transfected 293GP^°/LNBOTDC cells produce pseudotyped 
LNBOTDC virus. 

The pseudotyped LNBOTDC virus generated from the transfected 293GP^^/LNBOTDC 
cells can be collected at least once a day between 24 and 96 hr after transfection. The highest 
virus titer was generated approximately 48 to 72 hr after initial pHCMV-G transfection. 
While syncytium formation became visible about 48 hr after transfection in the majority of 
the transfected cells, the cells continued to generate pseudotyped virus for at least an 
additional 48 hr as long as the cells remained attached to the tissue culture plate. The 
collected culture medium containing the VSV G-pseudotyped LNBOTDC virus was pooled, 
filtered through a 0.45 [im filter and stored at -80°C or concentrated immediately and then 
stored at -80°C 

The titer of the VSV G-pseudotyped LNBOTDC virus was then determined as follows. 
Approximately 5x10"* rat 208F fibroblasts cells were plated into 6 well plates. Twenty- fours 
hours after plating, the cells were infected with serial dilutions of the LNBOTDC 
virus-containing culture medium in the presence of 8 |ag/ml polybrene. Twenty four hours 
after infection with virus, the medium was replaced with fresh medium containing 400 |ig/ml 
G418 and selection was continued for 14 days until G418-resistant colonies became visible. 
Viral titers were typically about 0.5 to 5.0 x 10^ colony forming units (cfu)/ml. The titer of 
the virus stock could be concentrated to a titer of greater than 10^ cfu/ml as described below. 



Example 4 

Concentration of Pseudotyped Retroviral Vectors 
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The VSV G-pseudotyped LNBOTDC viruses were concentrated to a high titer by one 
cycle of ultracentrifugation. However, two cycles can be performed for further concentration. 
The frozen culture medium collected as described in Example 2 which contained pseudotyped 
LNBOTDC virus was thawed in a 3TC water bath and was then transferred to Oakridge 
centrifuge tubes (50 ml Oakridge tubes with sealing caps, Nalge Nunc International) 
previously sterilized by autoclaving. The virus was sedimented in a JA20 rotor (Beckman) at 
48,000 X g (20,000 rpm) at 4°C for 120 min. The culture medium was then removed from 
the tubes in a biosafety hood and the media remaining in the tubes was aspirated to remove 
the supematent. The virus pellet was resuspended to 0.5 to 1% of the original volume of 
culture medium DMEM. The resuspended virus pellet was incubated overnight at 4°C without 
swirling. The virus pellet could be dispersed with gentle pipetting after the overnight 
incubation without significant loss of infectious virus. The titer of the virus stock was 
routinely increased 100- to 300-fold after one round of ultracentrifugation. The efficiency of 
recovery of infectious virus varied between 30 and 100%. 

The virus stock was then subjected to low speed centrifugation in a micro fuge for 5 
min at 4°C to remove any visible cell debris or aggregated virions that were not resuspended 
under the above conditions. It was noted that if the virus stock is not to be used for injection 
into oocytes or embryos, this centrifugation step may be omitted. 

The virus stock can be subjected to another round of ultracentrifixgation to further 
concentrate the virus stock. The resuspended virus from the first round of centrifugation is 
pooled and pelleted by a second round of ultracentrifugation which is performed as described 
above. Viral titers are increased approximately 2000-fold after the second round of 
ultracentrifugation (titers of the pseudotyped LNBOTDC virus are typically greater than or 
equal to 1 x 10^ cfii/ml after the second round of ultracentrifugation). 

The titers of the pre- and post-centriftigation fluids were determined by infection of 
208F cells (NIH 3T3 or bovine mammary epithehal cells can also be employed) followed by 
selection of G418-resistant colonies as described above in Example 2. 
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Preparation of Pseudotyped Retrovirus For Infection of Host Cells 



The concentrated pseudotyped retroviruses were resuspended in O.IX HBS (2.5 mM 
HEPES, pH 7.12, 14 mM NaCl, 75 |iM Na2HP04-H20) and 18 ^1 aliquots were placed in 0.5 
ml vials (Eppendorf) and stored at -80°C until used. The titer of the concentrated vector was 
determined by diluting l|il of the concentrated virus 10"^- or lO'^-fold with O.IX HBS. The 
diluted virus solution was then used to infect 208F and bovine mammary epithelial cells and 
viral titers were determined as described in Example 2. 

Example 6 

Expression of MN14 by Host Cells 

This Example describes the production of antibody MN14 from cells transfected with a 
high number of integrating vectors. Pseudotyped vector were made from the packaging cell 
lines for the following vectors: CMV MN14, a-LA MN14, and MMTV MN14. Rat 
fibroblasts (208F cells), MDBK cells (bovine kidney cells), and bovine mammary epithelial 
cells were transfected at a multiplicity of infection of 1000. One thousand cells were plated 
in a T25 flask and 10^ colony forming units (CPU's) of vector in 3 ml media was incubated 
with the cells. The duration of the infection was 24 hr, followed by a media change. 
Following transfection, the cells were allowed to grow and become confluent. 

The cell lines were grown to confluency in T25 flasks and 5ml of media was changed 
daily. The media was assayed daily for the presence of MN14. All of the MN14 produced is 
active (an ELISA to detect human IgG gave the exact same values as the CEA binding 
ELISA) and Western blotting has shown that the heavy and light chains are produced at a 
ratio that appears to be a 1:1 ratio. In addition, a non-denaturing Westem blot indicated that 
what appeared to be 100% of the antibody complexes were correctly formed (See Figure 1: 
Lane 1, 85 ng control Mnl4; Lane 2, bovine mammary cell line, a-LA promoter; Lane 3, 
bovine manmiary cell line, CMV promoter; Lane 4, bovine kidney cell line, a-LA promoter; 
Lane 5, bovine kidney cell line, CMV promoter; Lane 6, 208 cell line, a-LA promoter; Lane 
7, 208 cell line, CMV promoter)). 
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Figure 2 is a graph showing the production of MN14 over time for four cell lines. 
The Y axis shows MN14 production in ng/ml of media. The X-axis shows the day of media 
collection for the experiment. Four sets of data are shown on the graph. The comparisons 
are between the CMV and a-LA promoter and between the 208 cells and the bovine 
mammary cells. The bovine mammary cell line exhibited the highest expression, followed by 
the 208F cells and MDBK cells. With respect to the constructs, the CMV driven construct 
demonstrated the highest level of expression, followed by the a-LA driven gene construct and 
the MMTV construct. At 2 weeks, the level of daily production of the CMV construct was 
4.5 |Lig/ml of media (22.5 mg/day in a T25 flask). The level of expression subsequently 
increased slowly to 40 |ag/day as the cells became very densely confluent over the subsequent 
week. 2.7 L of media from an a-lac-MN14 packaging cell line was processed by affinity 
chromatography to produce a purified stock of MN14. 

Figure 3 is a western blot of a 15% SDS-PAGE gel run under denaturing conditions in 
order to separate the heavy and light chains of the MN14 antibody. Lane 1 shows MN14 
from bovine mammary cell line, hybrid a-LA promoter; lane 2 shows MN14 from bovine 
mammary cell line, CMV promoter; lane 3 shows MN14 from bovine kidney cell line, hybrid 
aLA promoter; lane 4 shows MN14 from bovine kidney cell line, CMV promoter; lane 5 
shows MN14 from rat fibroblast cell line, hybrid a-LA promoter; lane 6 shows MN14 from 
rat fibroblast, CMV promoter. In agreement with Figure 1 above, the results show that the 
heavy and light chains are produced in a ratio of approximately 1:1. 

Example 7 

Quantitation of Protein Produced Per Cell 

This Example describes the quantitation of the amount of protein produced per cell in 
cell cultures produced according to the invention. Various cells (208F cells, MDBK cells, and 
bovine mammary cells) were plated in 25 cm^ culture dishes at 1000 cells/dish. Three 
different vectors were used to infect the three cells types (CMV-MN14, MMTV-MN14, and 
a-LA-MN14) at an MOI of 1000 (titers: 2.8 X 10^ 4.9 X 10^ and 4.3 X 10^ respectively). 
Media was collected approximately every 24 hours from all cells. Following one month of 
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media collection, the 208F and MDBK cells were discarded due to poor health and low MN14 
expression. The cells were passaged to T25 flasks and collection of media from the bovine 
mammary cells was continued for approximately 2 months with continued expression of 
MN14. After two months in T25 flasks, the cells with CMV promoters were producing 22.5 
pg/cell/day and the cells with a-LA promoters were producing 2.5 pg MN14/cell/day. 

After 2 months in T25 flasks, roller bottles (850 cm^) were seeded to scale-up 
production and to determine if MN14 expression was stable following multiple passages. 
Two roller bottles were seeded with bovine mammary cells expressing MN14 from a CMV 
promoter and two roller bottles were seeded with bovine mammary cells expressing MN14 
from the a-LA promoter. The cultures reached confluency after approximately two weeks 
and continue to express MN14. Roller bottle expression is shown in Table 1 below. 



■ r^lllii^odiictii)^ M .Rollier Blottlie 




Cell Line 


Promoter 


MN14 
Production/ 
Week (jag/ml) 


MN14 
Production/ 
Week - Total 
(^ig/ml) 


Bovine 


CMV 


2.6 


1 - 520 


mammary 








Bovine 


CMV 


10.6 


2 - 2120 


mammary 








Bovine 


CMV 


8.7 


3 - 1740 


mammary 








Bovine 


CMV 


7.8 


4 - 1560 


mammary 








Bovine 


a-LA 


0.272 


1 - 54.4 


mammary 








Bovine 


a-LA 


2.8 


2 - 560 


mammary 








Bovine 


a-LA 


2.2 


3 - 440 


mammary 
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Bovine 
mammary 


a-LA 


2.3 


4 - 460 



Example 8 
Expression of LL2 Antibody 

This Example demonstrates the expression of antibody LL2 by bovine mammary cells 
and 293 human kidney fibroblast cells. Bovine mammary cells were infected v^ith vector 
l(fl CMV LL2 (7.85 x 10^ CFU/ml) at MOFs of 1000 and 10,000 and plated in 25cm2 culture 
dishes. None of the cells survived transfection at the MOI of 10,000. At 20% confluency, 
250 ng/ml of LL2 was present in the media. Active LL2 antibody was produced by both cell 
types. Non-denaturing and denaturing westem analysis demonstrated that all the antibody 
produced is active and correctly assembled in approximately a 1:1 ratio of heavy: light chain. 



1^ 

Q 
Li. 



Example 9 

Expression of Bot Antibody by Bovine Mammary Cells 



This Example demonstrates the expression of botulinum toxin antibody in bovine 
20 mammary cells. Bovine mammary cells were infected with vector a-LA Bot (2.2 X 10^ 
CFU/ml) and plated in 25cm^ culture dishes. At 100% confluency, 6 ng/ml of botulinum 
toxin antibody was present in the media. 

Example 10 

25 Expression of Hepatitis B Surface Antigen by Bovine Mammary Cells 

This Example demonstrates the expression of Hepatitis B Surface Antigen antibody in 
bovine mammary cells. Bovine mammary cells were infected with vector LSRNL (350 
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CFU/ml) and plated in 25cm^ culture dishes. At 100% confluency, 20 ng/ml of Hepatitis B 
Surface Antigen was present in the media. 

Example 11 

Expression of cc49IL2 Antigen Binding Protein 

This Example demonstrates the expression of cc49IL2 in bovine mammary cells and 
human kidney fibroblast cells. Bovine mammary cells were infected with vector LSRNL (3.1 
X 10^ CFU/ml) at a MOI of 1000 and plated in 25cm^ culture dishes. At 100% confluency, 
10 |ig/ml of CC49IL2 was present in the media. Human kidney fibroblast (293) cells were 
infected with the a-LA cc49IL2 vector. Active cc49-IL2 fusion protein was produced by the 
cells. 

Example 12 
Production of YP antibody 

This Example demonstrates the production of Yersinea pestis antibody by bovine 
mammary epithelial cells and human kidney fibroblast cells (293 cells). Cells lines were 
infected with the a-LA YP vector. Both of the cell lines produced YP antibody. All of the 
antibody is active and the heavy and light chains are produced in a ratio approximating 1:1. 

Example 13 

Expression of Multiple Proteins by Bovine Mammary Cells 

This Example demonstrates the expression of multiple proteins in bovine mammary 
cells. Mammary cells producing MN14 (infected with CMV-MN14 vector) were infected 
with CC49IL2 vector (3.1 X 10^ CFU/ml) at an MOI of 1000, and 1000 cells were plated in 
25cm^ culture plates. At 100% confluency, the cells expressed MN14 at 2.5 )xg/ml and 
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CC49IL2 at 5 jag/ml. 



Example 14 

Expression of Multiple Proteins by Bovine Mammary Cells 

This Example demonstrates the expression of multiple proteins in bovine mammary 
cells. Mammary cells producing MN14 (infected with CMV-MN14 vector) were infected 
with LSNRL vector (100 CFU/ml) at an MOI of 1000, and 1000 cells were plated in 25cm^ 
culture plates. At 100% confluency, the cells expressed MN14 at 2.5 jiig/ml and hepatitis 
surface antigen at 150 ng/ml. 

Example 15 

Expression of Multiple Proteins by Bovine Mammary Cells 

This Example demonstrates the expression of multiple proteins in bovine mammary 
cells. Mammary cells producing hepatitis B surface antigen (infected with LSRNL vector) 
were infected with cc49IL2 vector at an MOI of 1000, and 1000 cells were plated in 25cm^ 
culture plates. At 100% confluency, the cells expressed MN14 at 2.4 and hepatitis B surface 
antigen at 13. 

Example 16 

Expression of Hepatitis B Surface Antigen and Bot Antibody in Bovine Mammary Cells 

This Example demonstrates the culture of transfected cells in roller bottle cultures. 
208F cells and bovine mammary cells were plated in 25cm^ culture dishes at 1000 cells/ 
25cm^. LSRNL or ce-LA Bot vectors were used to infect each cell line at a MOI of 1000. 
Following one month of culture and media collection, the 208F cells were discarded due to 
poor growth and plating. Likewise, the bovine mammary cells infected with of-LA Bot were 
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discarded due to low protein expression. The bovine mammary cells infected with LSRNL 
were passaged to seed roller bottles (850cm^). Approximately 20 ng/ml hepatitis type B 
surface antigen was produced in the roller bottle cultures. 

5 Example 17 

Expression and Assay of G-protein Coupled Receptors 

This example describes the expression of a G-Protein Coupled Receptor protein 
(GPCR) from a retroviral vector. This example also describes the expression of a signal 
10 protein from an IRES as a marker for expression of a difficult to assay protein or a protein 
n that has no assay such as a GPCR. The gene construct (SEQ ID NO: 34; Figure 17) 
;J comprises a G-protein-coupled receptor followed by the IRES-signal peptide-antibody light 
^fi chain cloned into the MCS of pLBCX retroviral backbone. Briefly, a PvuII/PvuII fragment 
O (3057 bp) containing the GPCR-IRES-antibody light chain was cloned into the StuI site of 
1^ pLBCX. pLBCX contains the EM7 (T7) promoter, Blasticidin gene and SV40 polyA in place 
of the Neomycin resistance gene from pLNCX, 

The gene construct was used to produce a replication defective retroviral packaging 

Its : 

cell line and this cell line was used to produce replication defective retroviral vector. The 
p vector produced from this cell line was then used to infect 293GP cells (human embryonic 

20 kidney cells). After infection, the cells were placed under Blasticidin selection and single cell 
Blasticidin resistant clones were isolated. The clones were screened for expression of 
antibody light chain. The top 12 light chain expressing clones were selected. These 12 Hght 
chain expressing clones were then screened for expression of the GPCR using a ligand 
binding assay. All twelve of the samples also expressed the receptor protein. The clonal cell 

25 lines and there expression are shown in Table 2. 



Table|2 



ji 
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Cell Clone Number 


Antibody Light Chain Expression 


GPCR Expression 


4 


+ 


+ 


8 


+ 


+ 


13 


+ 




19 


+ 


+ 


20 


+ 


+ 


22 


+ 


4- 


24 


+ 




27 


+ 


+ 


30 


+ 


+ 


45 


+ 


+ 


46 




+ 


50 




+ 



All publications and patents mentioned in the above specification are herein 
incorporated by reference. Various modifications and variations of the described method and 
system of the invention will be apparent to those skilled in the art without departing from the 
scope and spirit of the invention. Although the invention has been described in connection 
with specific preferred embodiments, it should be understood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various modifications of 
the described modes for carrying out the invention which are obvious to those skilled in 
molecular biology, protein fermentation, biochemistry, or related fields are intended to be 
within the scope of the following claims. 

SEQUENCE LISTING 

<110> Bleck, Gregory 
Bremel, Robert 
Miller, Linda 
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<120> 


Expression Vectors 




<130> 


GAIiA-04406 




5 


<150> 
<151> 


60/215, 851 
2000-07-03 




10 


<160> 


36 








<170> 


Patentin version 3.0 




<210> 


1 






15 


<211> 


2101 






<212> 


DNA 






on 


<213> 


Artificial Sequence 




<220> 








\Q 


<223> 


Synthetic 






<400> 


1 








gatcagtcct 


gggtggtcat 


tgaaaggact 




ccacctgatg 


cgaagaactg 


actcatgtga 




caqqaqqaqa 


agggatgaca 


gaggatggaa 




atgagtttga 


y caay c L. L.CC 


aggagttggt 


□ 


ccatggggtt 


gcaaagagtt 


ggacactact 


nJ 


catggtacag 


aatataggat 


aaaaaagagg 




ggatataaaa 


gtttagaata 


cctttagttt 




tacccactgc 


aatataagaa 


atcaggcttt 




cataccagaa 


gctaacagct 


attggttata 


45 


tggttatata 


gcatgaagct 


tgatgccagc 


ctaaactcta 


catgttccag 


gacactgatc 




aggctctagg 


tgtatattgt 


ggggcttccc 




caatgtgggt 


gatctgggtt 


cgatccctgg 




caacccactc 


tagtactctt 


acctggaaaa 


55 


agtccatggg 


attgcaaaga 


gttgaacaca 


atacacctgt 


gaggtgaagt 


gaagtgaagg 




gattctttac 


catctgagcc 


accagggaag 


60 


cttctccagg 


ggatcttccc 


atcccaggaa 




attcttcacc 


agctgaacta 


ccaggtggat 



gatgctgaag 


ttgaagctcc 


aatactttgg 


60 


taagaccctg 


atactgggaa 


agattgaagg 


120 


gagttggatg 


gaatcaccaa 


ctcgatggac 


180 


aatgggcagg 


gaagcctggc 


gtgctgcagt 


240 


gagtgactga 


actgaactga 


tagtgtaatc 


300 


aagagtttgc 


cctgattctg 


aagagttgta 


360 


ggaagtctta 


aattatttac 


ttaggatggg 


420 


agagactgat 


gtagagagaa 


tgagccctgg 


480 


gctgttataa 


ccaatatata 


accaatatat 


540 


aatttgaagg 


aaccatttag 


aactagtatc 


600 


ttaaagctca 


ggttcagaat 


cttgttttat 


660 


tggtggctca 


gatggtaaag 


tgtctgcctg 


720 


cttgggaaga 


tcccctggag 


aaggaaatgg 


780 


ttccatggac 


agaggagcct 


tgtaagctac 


840 


actgagcaac 


taagcacagc 


acagtacagt 


900 


ttcaatgcag 


ggtctcctgc 


attgcagaaa 


960 


cccaagaata 


ctggagtggg 


tagcctattc 


1020 


ttgaactgga 


gtctcctgca 


tttcaggtgg 


1080 


actactccaa 


tattaaagtg 


cttaaagtcc 


1140 
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agttttccca 


cctttcccaa 


aaaggttggg 




ctgaggctgt 


ctacaagctt 


atatatttat 


5 


gatttacaat 


gtggtatctg 


gctatttagt 




tagcatctca 


gagggcagct 


agatactgtc 


10 


gaaatagaaa 


gtctctggat 


ctaagttata 


attctactcc 


tgaccactca 


acaaggaacc 




tgcctgggtt 


gagtgggcca 


tgacatatgt 


15 


ggacaagtgc 


cagctctgat 


cctgggactg 




cattctgcat 


gtctctaggg 


gggaaggggg 


20 


ctgattgcct 


cacttcttat 


attgccccca 


acagtgcttc 


ccagaaccaa 


ccctacaaga 




gcaggatcat 


ggtttgaact 


ctttctggcc 


2| 


tgggagaggg 


aaaggaaaag 


tagggtgaat 




ctgtcttggc 


atgaccagtc 


tctcttcatt 


3§ 


gcccctgagg 


ctttctgcat 


gaatataaat 




gttcttgggg 


gcgccgaatt 


cgagctcggt 


341 

ly 


c 

<210> 2 






=fl 


<211> 245 






Q 
4^ 


<212> DNA 







<213> Artificial Sequence 



<220> 

45 <223> Synthetic 

<400> 2 

gattacttac tggcaggtgc tgggggcttc 
50 acaccgcctc gaccagggtg agatatcggc 

ggatccgatt acttactggc aggtgctggg 

cacacaacac cgcctcgacc agggtgagat 

aagcg 

<210> 3 
60 <211> 680 

<212> DNA 



tcactctttt ttaaccttct gtggcctact 1200 

gaacacattt attgcaagtt gttagtttta 1260 

ggtattggtg gttggggatg gggaggctga 132 0 

atacacactt ttcaagttct ccatttttgt 1380 

tgtgattctc agtctctgtg gtcatattct 1440 

aagatatcaa gggacacttg ttttgtttca 1500 

tctgggcctt gttacatggc tggattggtt 1560 

tggcatgtga tgacatacac cccctctcca 1620 

aagctcggta tagaaccttt attgtatttt 1680 

tgcccttctt tgttcctcaa gtaaccagag 1740 

aacaaagggc taaacaaagc caaatgggaa 1800 

agagaacaat acctgctatg gactagatac 1860 

tatggaagga agctggcagg ctcagcgttt 192 0 

ctcttcctag atgtagggct tggtaccaga 1980 

atatgaaact gagtgatgct tccatttcag 204 0 

acccggggat ctcgaggggg ggcccggtac 2100 

2101 



cgagacaatc gcgaacatct acaccacaca 60 
cggggacgcg gcggtggtaa ttacaagcga 120 
ggcttccgag acaatcgcga acatctacac 180 
atcggccggg gacgcggcgg tggtaattac 240 

245 
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<213> Artificial Sequence 
<220> 

5 <223> Synthetic 



<400> 3 





ggaattcgcc 


cctctccctc 


ccccccccct 


aacgttactg 


gccgaagccg 


cttggaataa 


60 


10 


ggccggtgtg 


cgtttgtcta 


tatgttattt 


tccaccatat 


tgccgtcttt 


tggcaatgtg 


120 




a999cccgga 


aacctggccc 


tgtcttcttg 


acgagcattc 


ctaggggtct 


ttcccctctc 


180 


15 


gccaaaggaa 


tgcaaggtct 


gttgaatgtc 


gtgaaggaag 


cagttcctct 


ggaagcttct 


240 


tgaagacaaa 


caacgtctgt 


agcgaccctt 


tgcaggcagc 


ggaacccccc 


acctggcgac 


300 




aggtgcctct 


gcggccaaaa 


gccacgtgta 


taagatacac 


ctgcaaaggc 


ggcacaaccc 


360 


20 


cagtgccacg 


ttgtgagttg 


gatagttgtg 


gaaagagtca 


aatggctctc 


ctcaagcgta 


420 




ttcaacaagg 


ggctgaagga 


tgcccagaag 


gtaccccatt 


gtatgggatc 


tgatctgggg 


480 


2m 


cctcggtgca 


catgctttac 


atgtgtttag 


tcgaggttaa 


aaaaacgtct 


aggccccccg 


540 


aaccacgggg 


acgtggtttt 


cctttgaaaa 


acacgatgat 


aatatggcct 


cctttgtctc 


600 


\s 

I 1 

3© 


tctgctcctg 


gtaggcatcc 


tattccatgc 


cacccaggcc 


ggcgccatgg 


gatatctaga 


660 


tctcgagctc 


gcgaaagctt 










680 


yi 


<210> 4 














3^ 

11J 


<211> 4207 
<212> DNA 














<213> Artificial Sequence 
<220> 












<223> Synthetic 












45 


<400> 4 
cggatccggc 


cattagccat 


attattcatt 


ggttatatag 


cataaatcaa 


tattggctat 


60 




tggccattgc 


atacgttgta 


tccatatcat 


aatatgtaca 


tttatattgg 


ctcatgtcca 


120 


50 


acattaccgc 


catgttgaca 


ttgattattg 


actagttatt 


aatagtaatc 


aattacgggg 


180 


tcattagttc 


atagcccata 


tatggagttc 


cgcgttacat 


aacttacggt 


aaatggcccg 


240 




cctggctgac 


cgcccaacga 


cccccgccca 


ttgacgtcaa 


taatgacgta 


tgttcccata 


300 


55 


gtaacgccaa 


tagggacttt 


ccattgacgt 


caatgggtgg 


agtatttacg 


gtaaactgcc 


360 




cacttggcag 


tacatcaagt 


gtatcatatg 


ccaagtacgc 


cccctattga 


cgtcaatgac 


420 


60 


ggtaaatggc 


ccgcctggca 


ttatgcccag 


tacatgacct 


tatgggactt 


tcctacttgg 


480 


cagtacatct 


acgtattagt 


catcgctatt 


accatggtga 


tgcggttttg gcagtacatc 


540 



78 



10 



15 



20 



2fi 

H 

Q 
01 

3^ 



4(M 



45 



50 



55 



60 



aatgggcgtg 


gatagcggtt 


tgactcacgg 


ggatttccaa 


gucEccaccc 


cac tgacgcc 


O VJ u 


aatgggagtt 


tgttttggca 


ccaaaatcaa 


cgggacc c cc 


^ ^ ^ ^ +~ /^■f" 
caaaaugucg 


C a a C a a C U C L. 


O O v 


gccccattga 


cgcaaatggg 


cggtaggcat 


gtacggcggg 


ay y L. c u a t. a L. 


aay Cayay c t. 


79 0 


cgtttagtga 


accgtcagat 


cgcctggaga 


cgccauccac 


y c cy u t. c cya 


C C U d t- dy d. 


7 R n 

/ O V 


agacaccggg 


accgatccag 


cctccgcggc 


cccaagcttc 


t cgacggatc 


cccgggaac t. 


Q A n 

Oft u 


caggacctca 


ccatgggatg 


gage tgt ate 




cyy cay caac 


dy cdt-dyy l. 


-7 U w 


gtccactccg 


aggtccaact 


ggtggagagc 


ggtggaggtg 


c tgcgcaacc 


cggccggccc 


D U 


ctgcgcctgt 


cctgctccgc 


atcuggcE tc 


gacuc caeca 


ca uau tyyai- 


gaguugggcg 


1 09 n 

X u ^ u 


agacaggcac 


ctggaaaagg 


tcttgagtgg 


ac cggagaaa 


c ucacccaga 


L.aycdyudcy 


J_U o u 


attaactatg 


cgccgtctct 


aaaggataga 


t t t acaatat 


cy cyayacaa 


cy ccdayddc 


J. J.rx U 


acattgttcc 


tgcaaatgga 


cagcctgaga 


cccgaagaca 


ccggggtcta 


u u 1 1. cgugca 


1 o n n 


agcctttact 


tcggcttccc 


ctggtttgct 


t a 1 1 ggggc c 


aagggacccc 


ggucaccguc 


1 *C 0 


tcctcagcct 


ccaccaaggg 


cccatcggtc 


ttccccctgg 


caccctcctc 


caagagcacc 


lion 


tctgggggca 


cagcggccct 


gggctgcctg 


gtcaaggact 


acccccccga 


accgy uyacy 


1 T p n 
± J o u 


gtgtcgtgga 


actcaggcgc 


cctgaccagc 


ggcgtgcaca 


ccttcccggc 


cgucc cacag 


1 A A O 


tcctcaggac 


tctactccct 


cagcagcgtg 


gtgaccgtgc 


cc tccagcag 


ct ugygcacc 


1 R n n 
± J u u 


cagacctaca 


tctgcaacgt 


gaatcacaag 


cccagcaaca 


ccaaggtgga 


caagagagt t 


±DOU 


gagcccaaat 


cttgtgacaa 


aactcacaca 


tgcccaccgt 


gcccagcacc 


tgaact cc tg 




gggggaccgt 


cagtcttcct 


cttcccccca 


aaacccaagg 


acaccctcat 


gatctcccgg 


1 Q n 
-Lb o U 


acccctgagg 


tcacatgcgt 


ggtggtggac 


gtgagccacg 


aagaccctga 


ggucaagc uc 


1 7 A O 


aactggtacg 


tggacggcgt 


ggaggtgcat 


aatgccaaga 


caaagccgcg 


ggaggagc ag 


1 Q 0 n 
± o u u 


tacaacagca 


cgt accgtgt 


ggt cagcgt c 


c ucaccgtcc 


cgcaccayya 


ccyyc uyadt. 


X o o u 


ggcaaggagt 


acaagtgcaa 


ggt c t ccaac 


aaagcccL.cc 


cagcccccac 


cyayaaaacc 


1 QO n 


atctccaaag 


ccaaagggca 


gccccgagaa 


ccacaggtgt 


acaccc tgcc 


cccd t-cccyy 


1 Q p n 

X i7 o u 


gaggagatga 


ccaagaacca 


ggtcagcctg 


acctgcctgg 


tcaaaggctt 


ctat cccagc 


0 n A n 


gacatcgccg 


tggagtggga 


gagcaatggg 


cagccggaga 


acaactacaa 


gaccacgcct 


z X U U 


cccgtgctgg 


actccgacgg 


ctccttcttc 


ctctatagca 


agctcaccgt 


ggacaagagc 


z Xo U 


aggtggcagc 


aggggaacgt 


cttctcatgc 


tccgtgatgc 


acgaggctct 


gcacaaccac 


2220 


tacacgcaga 


agagcctctc 


cctgtctccc 


gggaaatgaa 


agccgaattc 


gcccctctcc 


2280 


ctcccccccc 


cctaacgtta 


ctggccgaag 


ccgcttggaa 


taaggccggt 


gtgcgtttgt 


2340 


ctatatgtta 


ttttccacca 


tattgccgtc 


ttttggcaat 


gtgagggccc 


ggaaacctgg 


2400 



79 





ccctgtcttc 


ttgacgagca 


ttcctagggg 


tctttcccct 


ctcgccaaag 


gaatgcaagg 


o A n 




tctgttgaat 


gtcgtgaagg 


aagcagttcc 


t c t ggaagc t 


ucu ugaagac 


aaacaacgtc 




c 

J 


tgtagcgacc 


ctttgcaggc 


agcggaaccc 


cccacctggc 


gacaggugcc 


uccgcggcca 


9 cr p n 
Z D 0 U 




aaagccacgt 


gtataagata 


cacctgcaaa 


ggcggcacaa 


ccccagtgcc 


acgu cgcgag 


"5 A n 

Z O ft u 


10 


ttggatagtt 


gtggaaagag 


tcaaatggct 


ctcctcaagc 


gt att caaca 


^9999^t.gaa 


o '7 n n 
z / u u 


ggatgcccag 


aaggtacccc 


attgtatggg 


atctgatctg 


gggcc tcggt 


gcacaugc tu 


o -7 zr ri 
Z / o U 




tacatgtgtt 


tagtcgaggt 


taaaaaaacg 


tctaggcccc 


ccgaaccacg 


gggacgtggt 


n Q O A 

Z oz U 


1 c 

i J 


tttcctttga 


aaaacacgat 


gataatatgg 


cctcctttgt 


^ ^ ^ ^^^1 4— 

ccc ticugccc 


ccggcaggca 


O P Q O 
Z O 0 U 




tcctattcca 


tgccacccag 


gccgacatcc 


agctgaccca 


gagcccaagc 


agcccgagcg 


O OA O 

z y4 u 


20 


ccagcgtggg 


tgacagagtg 


accatcacct 


gtaaggccag 


tcaggatgtg 


ggtactt ctg 


"5 fi n 

J u U U 


tagcctggta 


ccagcagaag 


ccaggtaagg 


ctccaaagct 


gctgatctac 


tggacat cca 


"3 r\ c A 


a 


cccggcacac 


tggtgtgcca 


agcagattca 


gcggtagcgg 


tagcggtacc 


gacttcacct 


3120 




tcaccatcag 


cagcctccag 


ccagaggaca 


tcgccaccta 


ctactgccag 


caatatagcc 


•3 T p 
J J-bU 


SI 


tctatcggtc 


gttcggccaa 


gggaccaagg 


tggaaat caa 


acgaac cgcg 


gc uy caccaL. 


0 A n 
J z^ u 




ctgtcttcat 


cttcccgcca 


tctgatgagc 


agt tgaaatc 


tggaactgcc 


-h/-•4-/-<rl-4-r<r-l-/-Tf- 
uC ugc tgugc 


"5 1 n n 


m 


gcctgctgaa 


taacttctat 


cccagagagg 


ccaaagtaca 


gtggaaggtg 


gataacgccc 


1 c n 
J J b u 


H 


tccaatcggg 


taactcccag 


gagagtgtca 


cagagcagga 


cagcaaggac 


agcacctaca 


3420 


3^ 


gcctcagcag 


caccctgacg 


ctgagcaaag 


cagactacga 


gaaacacaaa 


gtctacgcct 




Hi 


gcgaagtcac 


ccatcagggc 


ctgagctcgc 


ccgtcacaaa 


gagcttcaac 


aggggagagt 


3540 




gttagagatc 


taggcctcct 


aggtcgacat 


cgataaaata 


a aaga u u t u a 


tttagtctcc 


"3 A A 




agaaaaaggg 


gggaatgaaa 


gaccccacct 


gtaggtttgg 


caagctagct 


taagtaacgc 


-3 /r c n 
J b b U 




cattttgcaa 


ggcatggaaa 


aatacataac 


tgagaataga 


gaagttcaga 


t caaggt cag 


1 "7 0 n 
J / z u 




gaacagatgg 


aacagctgaa 


tatgggccaa 


acaggatatc 


tgtggtaagc 


agttcctgcc 


"3 T Q n 

J /oU 




ccggctcagg 


gccaagaaca 


gatggaacag 


ctgaatatgg 


gccaaacagg 


atatctgtgg 


0 0 ^ A 


50 


taagcagttc 


ctgccccggc 


tcagggccaa 


gaacagatgg 


tccccagatg 


cggtccagcc 


Q A A 

3 900 


ctcagcagtt 


tctagagaac 


catcagatgt 


ttccagggtg 


ccccaaggac 


ctgaaatgac 


3 950 




cctgtgcctt 


atttgaacta 


accaatcagt 


tcgcttctcg 


cttctgttcg 


cgcgcttctg 


yl A 0 A 

4L)z U 


55 


ctccccgagc 


tcaataaaag 


agcccacaac 


ccctcactcg 


gggcgccagt 


cctccgattg 


4080 




actgagtcgc 


ccgggtaccc 


gtgtatccaa 


taaaccctct 


tgcagttgca 


tccgacttgt 


4140 


60 


ggtctcgctg 


ttccttggga 


gggtctcctc 


tgagtgattg 


actacccgtc 


agcgggggtc 


4200 


tttcatt 












4207 



80 





<210> 5 








<211> 4210 




5 


<212> DNA 








<213> Artificial Sequence 


1 A 


<220> 








<223> Synthetic 






<400> 5 






15 


ggatccggcc 


attagccata 


ttattcattg 


ggccattgca 


tacgttgtat 


ccatatcata 




cattaccgcc 


atgttgacat 


tgattattga 


zU 


cattagttca 


tagcccatat 


atggagttcc 




ctggctgacc 


gcccaacgac 


ccccgcccat 




taacgccaat 


agggactttc 


cattgacgtc 




acttggcagt 


acatcaagtg 


tatcatatgc 


Q 


gtaaatggcc 


cgcctggcat 


tatgcccagt 


ill 


agtacatcta 


cgtattagtc 


atcgctatta 




atgggcgtgg 


atagcggttt 


gactcacggg 
ZJ -J ^ ^ 


\"": 


atgggagttt 


gttttggcac 


caaaatcaac 




ccccattgac 


gcaaatgggc 


ggtaggcatg 




gtttagtgaa 


ccgtcagatc 


gcctggagac 




gacaccggga 


ccgatccagc 


ctccgcggcc 




aggacctcac 


catgggatgg 


agctgtatca 


45 


tccactccca 


ggtccagctg 


gtccaatcag 




tgaaggtctc 


ctgcaaggct 


tctggctaca 




ggcaggcacc 


tggacagggt 


ctggaatgga 


jU 


ctgagtacaa 


tcagaacttc 


aaggacaagg 




cagcctacat 


ggagctgagc 


agcctgaggt 


55 


gaagggatat 


tactacgttc 


tactggggcc 


ccaccaaggg 


cccatcggtc 


ttccccctgg 




cagcggccct 


gggctgcctg 


gtcaaggact 


60 


actcaggcgc 


cctgaccagc 


ggcgtgcaca 




tctactccct 


cagcagcgtg 


gtgaccgtgc 



gttatatagc 


ataaatcaat 


attggctatt 


60 


atatgtacat 


ttatattggc 


tcatgtccaa 


120 


ctagttatta 


atagtaatca 


attacggggt 


180 


gcgttacata 


acttacggta 


aatggcccgc 


240 


tgacgtcaat 


aatgacgtat 


gttcccatag 


300 


aatgggtgga 


gtatttacgg 


taaactgccc 


360 


caagtacgcc 


ccctattgac 


gtcaatgacg 


420 


acatgacctt 


atgggacttt 


cctacttggc 


480 


ccatggtgat 


gcggttttgg 


cagtacatca 


540 


gatttccaag 


tctccacccc 


attgacgtca 


600 


gggactttcc 


aaaatgtcgt 


aacaactccg 


660 


tacggtggga 


ggtctatata 


agcagagctc 


720 


gccatccacg 


ctgttttgac 


ctccatagaa 


780 


ccaagcttct 


cgacggatcc 


ccgggaattc 


840 


tcctcttctt 


ggtagcaaca 


gctacaggtg 


900 


gggctgaagt 


caagaaacct 


gggtcatcag 


960 


cctttactag 


ctactggctg 


cactgggtca 


1020 


ttggatacat 


taatcctagg 


aatgattata 


1080 


ccacaataac 


tgcagacgaa 


tccaccaata 


1140 


ctgaggacac 


ggcattttat 


ttttgtgcaa 


1200 


aaggcaccac 


ggtcaccgtc 


tcctcagcct 


1260 


caccctcctc 


caagagcacc 


tctgggggca 


1320 


acttccccga 


accggtgacg 


gtgtcgtgga 


1380 


ccttcccggc 


tgtcctacag 


tcctcaggac 


1440 


cctccagcag 


cttgggcacc 


cagacctaca 


1500 



81 





tctgcaacgt 


gaatcacaag 


cccagcaaca 


ccaaggcgga 


caagagagt. c 


gagcccaa.a.L. 


1 ^^C\ 
± o 0 u 




cttgtgacaa 


aactcacaca 


tgcccaccgt 


gcccagcacc 


tgaac t cctg 


99gyg^c;cgT: 


± 0 z u 


J 


cagtcttcct 


cttcccccca 


aaacccaagg 


acaccctcat 


gat c t cccgg 


acccctgagg 


1 p n 
± 0 0 u 




tcacatgcgt 


ggtggtggac 


gtgagccacg 


aagaccctga 


ggtcaagttc 


aacuggcacg 


1 T A n 


10 


tggacggcgt 


ggaggtgcat 


aatgccaaga 


caaagccgcg 


ggag gage ag 


L.acaa.ca.gco. 


X 0 U VJ 


^^^^ ^^^^^^^^^ 

eg uaccy cgt. 


ggtcagcgcc 


c u c o. (J c g u c c 


ugcducaggo. 




yy (^cLctyy cL^ L. 


J. 0 Q VJ 




acaagtgcaa 


ggt c t ccaac 


aaagccct CC 


cagcccccat 


cgagaaaac c 


a U C U CCa.a.a.g 


J. u 


1 c 
1 J 


ccaaagggca 


gccccgagaa 


ccacaggtgt 


acaccctgcc 


cccat cccgg 


gaggagauga 


1 Q Q n 
± y 0 u 




ccaagaacca 


ggtcagcctg 


acctgcctgg 


tcaaaggctt 


ctatcccagc 


gacatcgccg 


0 A A n 


20 


tggagtggga 


gagcaatggg 


cagccggaga 


acaactacaa 


gaccacgcct 


cccgtgctgg 


^ J.U U 


actccgacgg 


ctccttcttc 


ctctatagca 


agctcaccgt 


ggacaagagc 


aggtggcagc 




: s 


aggggaacgt 


cttctcatgc 


tccgtgatgc 


acgaggctct 


gcacaaccac 


tacacgcaga 


2220 




agagcctctc 


cctgtctccc 


gggaaatgaa 


agccgaattc 


gcccc t ct cc 


cccccccccc 


0 0 Q n 
z z 0 u 




cc taacgt t a 


ctggccgaag 


ccgct tggaa 


taaggccggt 


gtgcgt t tgt 


c uauaugc ua 


0 1 A n 

Z J *l u 




ttttccacca 


tattgccgtc 


tt ttggcaat 


gtgagggccc 


ggaaacc tgg 


ccc tgt ct t c 


0 A n fi 
z u u 














5 


ttgacgagca 


ttcctagggg 


uCuuUCCCCC 


ctcgccaaag 


gaatgcaagg 


tctgttgaat 


0 A n 
z4 D U 


b 


gtcgtgaagg 


aagcagttcc 


tctggaagct 


tcttgaagac 


aaacaacgtc 


tgtagcgacc 


0 c 0 n 

Z DZ U 




ctttgcaggc 


agcggaaccc 


cccacctggc 


gacaggtgcc 


tctgcggcca 


aaagccacgt 


0 C Q n 
Z 3 0 U 


i U 

. 


gtataagata 


cacctgcaaa 


ggcggcacaa 


ccccagtgcc 


acgt tgtgag 


t tggat agt t 


0 c A r\ 
z d4 U 




gtggaaagag 


tcaaatggct 


ct cctcaagc 


gtattcaaca 


aggggctgaa 


ggatgcccag 


0 T n n 
z / U U 


aaggtacccc 


attgtatggg 


atctgatctg 


gggcctcggt 


gcacatgctt 


tacatgtgtt 


0 "7 £r A 

z /bU 




tagtcgaggt 


taaaaaaacg 


tctaggcccc 


ccgaaccacg 


gggacgtggt 


tu ucccucga 


0 Q 0 A 
Z OZ U 




aaaacacgat 


gataatatgg 


cctcct ttgt 


ctctctgctc 


ctggtaggca 


tcctattcca 


O 0 0 A 

z 0 0 (J 




tgccacccag 


gccgacatcc 


agctgaccca 


gtctccatca 


tctctgagcg 


catctgttgg 


0 OA A 

z y4 u 


50 


agatagggtc 


actatgagct 


gtaagtccag 


tcaaagtgtt 


ttatacagtg 


caaatcacaa 


"5 A A 


gaactacttg 


gcctggtacc 


agcagaaacc 


agggaaagca 


cctaaactgc 


tgatctactg 


•5 n ^ A 




ggcatccact 


agggaatctg 


gtgtcccttc 


gcgattctct 


ggcagcggat 


ctgggacaga 


J iZ U 


55 


ttttactttc 


accatcagct 


ctcttcaacc 


agaagacatt 


gcaacatatt 


attgtcacca 


3180 




atacctctcc 


tcgtggacgt 


tcggtggagg 


gaccaaggtg 


cagatcaaac 


gaactgtggc 


3240 


60 


tgcaccatct 


gtcttcatct 


tcccgccatc 


tgatgagcag 


ttgaaatctg 


gaactgcctc 


3300 


tgttgtgtgc 


ctgctgaata 


acttctatcc 


cagagaggcc 


aaagtacagt 


ggaaggtgga 


3360 



82 





taacgccctc 


caatcgggta 


actcccagga 


gagtgtcaca 


gagcaggac a 


gcaaggacag 


'3 A O A 




cacctacagc 


ctcagcagca 


ccctgacgct 


gagcaaagca 


gactacgaga 


aacacaaagt 


o /I Q n 


J 


ctacgcctgc 


gaagtcaccc 


atcagggcct 


gagctcgccc 


gtcacaaaga 


gcttcaacag 


o c /I n 




gggagagtgt 


tagagatcta 


ggcctcctag 


gtcgacatcg 


ataaaataaa 


agac 1. 1. cat u 


J O U U 


10 


tagtctccag 


aaaaaggggg 


gaatgaaaga 


ccccacctgt 


aggtttggca 


age t. age u c a 


"3 ^ c n 
J o b u 


agtaacgcca 


ttttgcaagg 


catggaaaaa 


tacataactg 


agaatagaga 


aguccagat c 


i 1 Z,\J 




3-^99 tcagga 


acagatggaa 


cagctgaata 


tgggccaaac 


aggatatctg 


tggtaagcag 


O T Q A 

J /oU 


1 c 

I J 


ttcctgcccc 


ggctcagggc 


caagaacaga 


tggaacagct 


gaatatgggc 


caaacaggat 


"3 Q A A 




atctgtggta 


agcagttcct 


gccccggctc 


agggccaaga 


acagatggtc 


cccagatgcg 


o Q n rv 
J y U U 


20 


gtccagccct 


cagcagtttc 


tagagaacca 


tcagatgttt 


ccagggtgcc 


ccaaggacct 


-3 Q /- A 

J y b u 


gaaatgaccc 


tgtgccttat 


ttgaactaac 


caatcagttc 


gcttctcgct 


tctgttcgcg 


^ A O A 




cgcttctgct 


ccccgagctc 


aataaaagag 


cccacaaccc 


ctcactcggg 


gcgccagtcc 


y1 A O A 

4080 




tccgattgac 


tgagtcgccc 


gggtacccgt 


gtatccaata 


aaccctcttg 


cagttgcatc 


4140 




cgacttgtgg 


tctcgctgtt 


ccttgggagg 


gtctcctctg 


agtgattgac 


tacccgtcag 


42 00 




gtctttcatt 












4210 


tcsir 

Ut 


<210> 6 














H 

3^ 

y 5 


<211> 5732 












<212> DNA 














1 y 

hD 


<213> Artificial Sequence 










<220> 

<223> Synthetic 












45 


<400> 6 
cgagcttggc 


agaaatggtt 


gaactcccga 


gagtgtccta 


cacctagggg 


agaagcagcc 


5 0 


aaggggttgt 


ttcccaccaa 


ggacgacccg 


tctgcgcaca 


aacggatgag 


cccatcagac 


Xz 0 




aaagacatat 


tcattctctg 


ctgcaaactt 


ggcatagctc 


tgctttgcct 


ggggctattg 


T O A 

loO 




ggggaagttg 


cggttcgtgc 


tcgcagggct 


ctcacccttg 


actctttcaa 


taataactct 






tctgtgcaag 


attacaatct 


aaacaattcg 


gagaactcga 


ccttcctcct 


gaggc aagga 


O A A 

3 00 


55 


ccacagccaa 


cttcctctta 


caagccgcat 


cgattttgtc 


cttcagaaat 


agaaataaga 


o *r A 


atgcttgcta 


aaaattatat 


ttttaccaat 


aagaccaatc 


caataggtag 


attattagtt 


420 




actatgttaa 


gaaatgaatc 


attatctttt 


agtactattt 


ttactcaaat 


tcagaagtta 


480 


60 


gaaatgggaa 


tagaaaatag 


aaagagacgc 


tcaacctcaa 


ttgaagaaca 


ggtgcaagga 


540 




ctattgacca 


caggcctaga 


agtaaaaaag 


ggaaaaaaga 


gtgtttttgt 


caaaatagga 


600 



83 





gacaggtggt 


ggcaaccagg 


gacttatagg 


ggaccttaca 


tctacagacc 


aacagatgcc 


660 




cccttaccat 


atacaggaag 


atatgactta 


aattgggata 


ggtgggttac 


agtcaatggc 


•7 o r» 


r 

J 


tataaagtgt 


tatatagatc 


cctccccttt 


cgtgaaagac 


tcgccagagc 


tagacctcct 


Ton 




tggtgtatgt 


tgtctcaaga 


aaagaaagac 


gacatgaaac 


aacaggcaca 


uga L. u a c a c t. 


Q A n 


10 


tatctaggaa 


caggaatgca 


cttttgggga 


aagattttcc 


ataccaagga 


ggggacagtg 


y uu 


gctggactaa 


tagaacatta 


ttctgcaaaa 


acttatggca 


cgagutat ca 


tgattagcct 


Q ^ r\ 




tgatttgccc 


aaccttgcgg 


ttcccaaggc 


ttaagtaagt 


4- J— 4- 4- >-_m4. 4- n « 

ttttggttac 


_ _ _a4_a4-4-a4_ 

aaactgtuct: 


T o r\ 


1 ^ 
1!) 


taaaacaagg 


atgtgagaca 


agtggtttcc 


tgacttggtt 


tggtatcaaa 


ggttctgatc 


T n o r» 




tgagctctga 


gtgttctatt 


ttcctatgtt 


cttttggaat 


ttatccaaat 


cttatgtaaa 


1 T /I A 


20 


tgcttatgta 


aaccaagata 


taaaagagtg 


_4__^4-i-i-4-4-4- 

ccgact ucc L 


gagtaaactt 


gcaacagtcc 


1 o n n 


taacattcac 


ctcttgtgtg 


uLugcgtccg 


ttcgccatcc 


cgtctccgct 


cgtcacttat 


1 o c n 
J.Z bU 


■ y 


ccttcacttt 


ccagagggtc 


cccccgcaga 


ccccggcgac 


cctcaggtcg 


gccgactgcg 


T o o r» 


LU 


gcagctggcg 


cccgaacagg 


gaccctcgga 


taagtgaccc 


4- 4- A. 4- A 4— 4— 4— A 4- 

ttgtctttat 


ttccactatt 


T O A 




ttgtgttcgt 


cttgttttgt 


ctctatcttg 


tctggctatc 


atcacaagag 


cggaacggac 


1 /I /I r\ 


'Si 


tcacctcagg 


gaaccaagct 


agcccggggt 


cgacggatcc 


gattacttac 


tggcaggtgc 


T IT A A 
1500 


m 


tgggggcttc 


cgagacaatc 


gcgaacatct 


acaccacaca 


acaccgcctc 


gaccagggtg 


T C ^ A 

IboO 


r: 


agatatcggc 


cggggacgcg 


gcggtggtaa 


ttacaagcga 


gatccgatta 


cttactggca 


1620 




ggtgctgggg 


gcttccgaga 


caatcgcgaa 


catctacacc 


acacaacacc 


gcctcgacca 


T ^ O A 

1680 




gggtgagata 


tcggccgggg 


acgcggcggt 


ggtaattaca 


agcgagatcc 


ccgggaattc 


1740 




aggacctcac 


catgggatgg 


agctgtatca 


tcctcttctt 


ggtagcaaca 


gctacaggtg 


1800 




















tccactccga 


ggtccaactg 


gtggagagcg 


gtggaggtgt 


tgtgcaacct 


ggccggtccc 


1860 




tgcgcctgtc 


ctgctccgca 


tctggcttcg 


atttcaccac 


atattggatg 


agttgggtga 


T Q O A 

iy-<s 0 




gacaggcacc 


tggaaaaggt 


cttgagtgga 


ttggagaaat 


tcatccagat 


agcagtacga 


T A O A 

19 80 




ttaactatgc 


gccgtctcta 


aaggatagat 


ttacaatatc 


gcgagacaac 


gccaagaaca 


O A j1 A 

2040 


50 


catcgttcct 


gcaaatggac 


agcctgagac 


ccgaagacac 


cggggtctat 


ttttgtgcaa 


O "I A A 

2100 


gcctttactt 


cggcttcccc 


tggtttgctt 


attggggcca 


agggaccccg 


gtcaccgtct 


2160 




cctcagcctc 


caccaagggc 


ccatcggtct 


tccccctggc 


accctcctcc 


aagagcacct 


2220 


55 


ctgggggcac 


agcggccctg 


ggctgcctgg 


tcaaggacta 


cttccccgaa 


ccggtgacgg 


2280 




tgtcgtggaa 


ctcaggcgcc 


ctgaccagcg 


gcgtgcacac 


cttcccggct 


gtcctacagt 


2340 


60 


cctcaggact 


ctactccctc 


agcagcgtgg 


tgaccgtgcc 


ctccagcagc 


ttgggcaccc 


2400 


agacctacat 


ctgcaacgtg 


aatcacaagc 


ccagcaacac 


caaggtggac 


aagagagttg 


2460 
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agcccaaatc ttgtgacaaa actcacacat gcccaccgtg cccagcacct gaactcctgg 2520 

ggggaccgtc agtcttcctc ttccccccaa aacccaagga caccctcatg atctcccgga 2580 

5 cccctgaggt cacatgcgtg gtggtggacg tgagccacga agaccctgag gtcaagttca 2640 

actggtacgt ggacggcgtg gaggtgcata atgccaagac aaagccgcgg gaggagcagt 2700 

acaacagcac gtaccgtgtg gtcagcgtcc tcaccgtcct gcaccaggac tggctgaatg 2760 

10 

gcaaggagta caagtgcaag gtctccaaca aagccctccc agcccccatc gagaaaacca 2 82 0 

tctccaaagc caaagggcag ccccgagaac cacaggtgta caccctgccc ccatcccggg 2 880 

15 aggagatgac caagaaccag gtcagcctga cctgcctggt caaaggcttc tatcccagcg 2 940 

acatcgccgt ggagtgggag agcaatgggc agccggagaa caactacaag accacgcctc 3 000 

ccgtgctgga ctccgacggc tccttcttcc tctatagcaa gctcaccgtg gacaagagca 3 060 

20 

ggtggcagca ggggaacgtc ttctcatgct ccgtgatgca cgaggctctg cacaaccact 312 0 

O acacgcagaa gagcctctcc ctgtctcccg ggaaatgaaa gccgaattcg cccctctccc 3180 

tccccccccc ctaacgttac tggccgaagc cgcttggaat aaggccggtg tgcgtttgtc 324 0 

tatatgttat tttccaccat attgccgtct tttggcaatg tgagggcccg gaaacctggc 3300 

cctgtcttct tgacgagcat tcctaggggt ctttcccctc tcgccaaagg aatgcaaggt 3360 

ctgttgaatg tcgtgaagga agcagttcct ctggaagctt cttgaagaca aacaacgtct 3420 

11 gtagcgaccc tttgcaggca gcggaacccc ccacctggcg acaggtgcct ctgcggccaa 3480 

; s 

3^ aagccacgtg tataagatac acctgcaaag gcggcacaac cccagtgcca cgttgtgagt 3540 

fy tggatagttg tggaaagagt caaatggctc tcctcaagcg tattcaacaa ggggctgaag 3 600 

gatgcccaga aggtacccca ttgtatggga tctgatctgg ggcctcggtg cacatgcttt 3660 

acatgtgttt agtcgaggtt aaaaaaacgt ctaggccccc cgaaccacgg ggacgtggtt 3 72 0 

ttcctttgaa aaacacgatg ataatatggc ctcctttgtc tctctgctcc tggtaggcat 3780 

45 cctattccat gccacccagg ccgacatcca gctgacccag agcccaagca gcctgagcgc 3 840 

cagcgtgggt gacagagtga ccatcacctg taaggccagt caggatgtgg gtacttctgt 3 900 

agcctggtac cagcagaagc caggtaaggc tccaaagctg ctgatctact ggacatccac 3960 

ccggcacact ggtgtgccaa gcagattcag cggtagcggt agcggtaccg acttcacctt 4020 

caccatcagc agcctccagc cagaggacat cgccacctac tactgccagc aatatagcct 4 080 

55 ctatcggtcg ttcggccaag ggaccaaggt ggaaatcaaa cgaactgtgg ctgcaccatc 4140 

tgtcttcatc ttcccgccat ctgatgagca gttgaaatct ggaactgcct ctgttgtgtg 42 00 

cctgctgaat aacttctatc ccagagaggc caaagtacag tggaaggtgg ataacgccct 4260 

60 

ccaatcgggt aactcccagg agagtgtcac agagcaggac agcaaggaca gcacctacag 4320 
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cctcagcagc 


accctgacgc 


tgagcaaagc 


agactacgag 


aaacacaaag 


tctacgcctg 


/I o o n 




cgaagtcacc 


cat cagggcc 


tgagctcgcc 


cgtcacaaag 


agcttcaaca 


ggggagagtg 


A A A f\ 


D 


ttagagatcc 


c c cgggc c gc 


aggaa 1 1 cga 


tatcaagct t 


at cgataat c 


aacct ctgga 


y1 C A A 

4 o U U 




ttacaaaatt 


cgt-gaaagac 


tgactggtat 


tcttaactat 


gttgctcctt 


c cacgcuatg 


A c c n 
4d b U 


10 


tggatacgct 


gctttaatgc 


A 4- ^ 4- irv4- n 

Cuctgtacca 


cgcuauugcu 


tcccgtatgg 


ccutcat: tut 


4dz U 


ctcctccttg 


tataaatcct 


ggttgctgtc 


tctttatgag 


gagttgtggc 


ccgttgtcag 


4 b o U 




gcaacgtggc 


gtggtgtgca 


ctgtgtt tgc 


tgacgcaacc 


cccactggt t 


ggggcat tgc 


4 /4 U 




caccacctgt 


c age c c c u t. u 


ccgggact t t 


cgc tt tcccc 


ctccctat tg 


ccacggcgga 


A Qr\f\ 
4 oU U 




actcatcgcc 


gcctgccttg 


cccgctgctg 


gacaggggct 


cggctgttgg 


gcactgacaa 


A o c r\ 
4oD U 


20 


ttccgtggtg 


ttgtcgggga 


aatcatcgtc 


ctttccttgg 


ctgctcgcct 


gtgttgccac 


yi Q o n 
4 y z5 U 


ctggattctg 


cgcgggacgt 


ccttctgcta 


cgtcccttcg 


gccctcaatc 


cagcggacct 


4y ou 


o 


tccttcccgc 


ggcctgctgc 


cggctctgcg 


gcctcttccg 


cgtcttcgcc 


ttcgccctca 


c A /I n 
dU4 U 




gacgagtcgg 


atcucccuct: 


gggccgcct c 


cccgcctgat 


cgataccgtc 


aacatcgata 


IT "1 A A 




aaataaaaga 


t t u cat tcag 


tctccagaaa 


aaggggggaa 


tgaaagaccc 


cacctgtagg 


b jLb U 


d 


tttggcaagc 


cagcc uaagc 


aacgccatt t 


tgcaaggcat 


ggaaaaatac 


ataactgaga 


C O O A 


y » 


atagagaagt 


tcagatcaag 


gtcaggaaca 


gatggaacag 


ctgaatatgg 


gccaaacagg 


C O O A 




atatctgtgg 


taagcagttc 


ctgccccggc 


tcagggccaa 


gaacagatgg 


aacagctgaa 


5340 


m 


tatgggccaa 


a-caggatatc 


tgtggtaagc 


agttcctgcc 


ccggctcagg 


gccaagaaca 


C /I A A 

540 0 


gatggtcccc 


agatgcggtc 


cagccctcag 


cagtttctag 


agaaccatca 


gatgtttcca 


5460 




gggtgcccca 


aggacctgaa 


atgaccctgt 


gccttatttg 


aactaaccaa 


tcagttcgct 


C IT O A 

552 0 




















tctcgcttct 


gttcgcgcgc 


ttctgctccc 


cgagctcaat 


aaaagagccc 


acaacccctc 


5580 




actcggggcg 


ccagtcctcc 


gattgactga 


gtcgcccggg 


tacccgtgta 


tccaataaac 


5640 


45 


cctcttgcag 


ttgcatccga 


cttgtggtct 


cgctgttcct 


tgggagggtc 


tcctctgagt 


5700 




gattgactac 


ccgtcagcgg 


gggtctttca 


tt 






5732 



50 <210> 7 

<211> 9183 

<212> DNA 

55 

<213> Artificial Sequence 
<220> 

60 <223> Synthetic 

<400> 7 
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tgcaatgcgg 
acatcgcatc 
ggacgaagag 
gcccgacggc 
ggaaaatggc 
tcaggacata 
ccgcttcctc 
ccttcttgac 
cccaacctgc 
ggaatcgttt 
ttcttcgccc 
gacgacctcg 
ggctatccgc 
gtcgaggcgg 
tgaggaatac 
agtccaaccc 
cgtccgagaa 
tgaagctcca 
tactgggaaa 
aatcaccaac 
aagcctggcg 
ctgaactgat 
ctgattctga 
attatttact 
tagagagaat 
caatatataa 
accatttaga 
gttcagaatc 
atggtaaagt 
cccctggaga 
gaggagcctt 



cggctgcata 
gagcgagcac 
catcaggggc 
gaggatctcg 
cgcttttctg 
gcgttggcta 
gtgctttacg 
gagttcttct 
catcacgaga 
tccgggacgc 
accccgggct 
cggagttcta 
gcatccatgc 
atcctagaac 
cgattctctc 
agataacgat 
taacgagtgg 
atactttggc 
gattgaaggc 
tcgatggaca 
tgctgcagtc 
agtgtaatcc 
agagttgtag 
taggatgggt 
gagccctggc 
ccaatatatt 
actagtatcc 
ttgttttata 
gtctgcctgc 
aggaaatggc 
gtaagctaca 



cgcttgatcc 
gtactcggat 
tcgcgccagc 
tcgtgaccca 
gattcatcga 
cccgtgatat 
gtatcgccgc 
gagcgggact 
tttcgattcc 
cggctggatg 
cgatcccctc 
ccggcagtgc 
ccccgaactg 
tagcgaaaat 
attaacatat 
catatacatg 
atcagtcctg 
cacctgatgc 
aggaggagaa 
tgagtttgag 
catggggttg 
atggtacaga 
gatataaaag 
acccactgca 
ataccagaag 
ggttatatag 
taaactctac 
ggctctaggt 
aatgtgggtg 
aacccactct 
gtccatggga 



ggctacctgc 
ggaagccggt 
cgaactgttc 
tggcgatgcc 
ctgtggccgg 
tgctgaagag 
tcccgattcg 
ctggggttcg 
accgccgcct 
atcctccagc 
gcgagttggt 
aaatccgtcg 
caggagtggg 
gcaagagcaa 
tcaggccagt 
gttctctcca 
ggtggtcatt 
gaagaactga 
gggatgacag 
caagcttcca 
caaagagttg 
atataggata 
tttagaatac 
atataagaaa 
ctaacagcta 
catgaagctt 
atgttccagg 
gtatattgtg 
atctgggttc 
agtactctta 
ttgcaaagag 



ccattcgacc 
cttgtcgatc 
gccaggctca 
tgcttgccga 
ctgggtgtgg 
cttggcggcg 
cagcgcatcg 
aaatgaccga 
tctatgaaag 
gcggggatct 
tcagctgctg 
gcatccagga 
gaggcacgat 
agacgaaaac 
tatctgggct 
gaggttcatt 
gaaaggactg 
ctcatgtgat 
aggatggaag 
99^9ttggta 
gacactactg 
aaaaagagga 
ctttagtttg 
tcaggcttta 
ttggttatag 
gatgccagca 
acactgatct 
gggcttccct 
gatccctggc 
cctggaaaat 
ttgaacacaa 



accaagcgaa 
aggatgatct 
aggcgcgcat 
atatcatggt 
cggaccgcta 
aatgggctga 
ccttctatcg 
ccaagcgacg 
gttgggcttc 
catgctggag 
cctgaggctg 
aaccagcagc 
ggccgctttg 
atgccacaca 
taaaagcaga 
actgaacact 
atgctgaagt 
aagaccctga 
^gttggatgg 
^tgggcaggg 
agtgactgaa 
agagtttgcc 
gaagtcttaa 
gagactgatg 
ctgttataac 
atttgaagga 
taaagctcag 
ggtggctcag 
ttgggaagat 
tccatggaca 
ctgagcaact 
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aagcacagca 


cagtacagta 


tacacctgtg 


aggtgaagtg 


aagtgaaggt 


tcaatgcagg 


-} "7 Q A 




gtctcctgca 


ttgcagaaag 


attctttacc 


atctgagcca 


ccagggaagc 


ccaagaauac 


Q Q /I A 


r 

J 


tggagtgggt 


agcctattcc 


ttctccaggg 


gatcttccca 


tcccaggaat 


tgaactggag 


•5 Q A A 

J y u u 




tctcctgcat 


ttcaggtgga 


ttcttcacca 


gctgaactac 


caggtggata 


ct act ccaat 


O Q /r A 
J y D U 


10 


attaaagtgc 


ttaaagtcca 


gttttcccac 


ctttcccaaa 


aaggt t gggt 


cacucc u u uC 


jI A o A 


taaccttctg 


tggcctactc 


tgaggctgtc 


tacaagc t t a 


cauat-u cacg 


aacacau u ua 


^ n Q n 




ttgcaagttg 


ttagttttag 


atttacaatg 


tggtatctgg 


ctactcagug 


gtattggtgg 


^ T /I A 


1 c 


ttggggatgg 


ggaggctgat 


agcatctcag 


agggc age t a 


gatactgtca 


tacacacttt 


A o n n 
4z U U 




tcaagttctc 


cattt ttgtg 


aaatagaaag 


tzcuctggatc 


^ — — n4- 4-^4-^4- 

taagtcacac 


gtgattctca 


4^ b U 


20 


gtctctgtgg 


tcatattcta 


ttctactcct 


gaccactcaa 


caaggaacca 


agatatcaag 


/I O O A 

4 U 


ggacacttgt 


tttgtttcat 


gcctgggttg 


agtgggccat 


gacatatgtt 


ctgggcct tg 


^ "3 O A 

4 J o U 




ttacatggct 


ggattggttg 


gacaagtgcc 


agctctgatc 


ctgggactgt 


ggcatgtgat 


4440 


m 


gacatacacc 


ccctctccac 


attctgcatg 


tctctagggg 


ggaaggggga 


agctcggtat 


/t C A A 

41) U U 


agaacctt ta 


utgtattiucc 


tgattgcctc 


acuccccatia 


ttgcccccat 


gcccttct tt 


A c c n 
4 Ob U 


H 


gttcctcaag 


taaccagaga 


cagtgct tec 


cagaaccaac 


cctacaagaa 


acaaagggct 


A CO n 
4 b^ U 


3(P 














m 


aaacaaagcc 


aaatgggaag 


caggatcatg 


gtttgaactc 


tttctggcca 


gagaacaata 


y1 ^ O A 

4b oU 




cctgctatgg 


actagatact 




aaggaaaagt 


agggtgaatt 


atggaaggaa 


4 /4U 


yi 


gctggcaggc 


tcagcgtutc 


tgtcttggca 


tgaccagtct 


ctcttcattc 


tcttcctaga 


^ Q A A 

4 oU U 


tgtagggctt 


ggtaccagag 


cccctgaggc 


^ 4- 4- M 4- n 4- .» 

tttctgcatg 


aatataaata 


tatgaaactg 


A Q c r\ 


4© 


agtgatgctt 


ccatttcagg 


ttcccggggg 


cgccgaattc 


gage t egg t a 


cccggggatc 


jI Q O A 

4 y z u 


tcgacggatc 


cgattactta 


ctggcaggtg 


ctgggggctt 


ccgagacaat 


cgcgaacatc 


/I Q O A 

4y ou 




tacaccacac 


aacaccgcct 


cgaccagggt 


gagatatcgg 


ccggggacgc 


99cggtggta 


C A ^ A 

bU4 U 




attacaagcg 


agatccgatt 


acttactggc 


aggtgctggg 


ggcttccgag 


acaatcgcga 


C 1 A A 




acatctacac 


cacacaacac 


cgcctcgacc 


agggtgagat 


atcggccggg 


gacgcggcgg 


r> J.b U 


50 


tggtaattac 


aagcgagatc 


cccgggaatt 


caggacctca 


ccatgggatg 


gagctgtatc 


C O O A 

522 0 


atcctcttct 


tggtagcaac 


agctacaggt 


gtccactccg 


aggtccaact 


ggtggagagc 


C O O A 




ggtggaggtg 


ttgtgcaacc 


tggccggtcc 


ctgcgcctgt 


cctgctccgc 


atctggcttc 


C O /I A 

5340 


55 


gatttcacca 


catattggat 


gagttgggtg 


agacaggcac 


ctggaaaagg 


tcttgagtgg 


5400 




attggagaaa 


ttcatccaga 


tagcagtacg 


attaactatg 


cgccgtctct 


aaaggataga 


5460 


60 


tttacaatat 


cgcgagacaa 


cgccaagaac 


acattgttcc 


tgcaaatgga 


cagcctgaga 


5520 


cccgaagaca 


ccggggtcta 


tttttgtgca 


agcctttact 


tcggcttccc 


ctggtttgct 


5580 
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10 



15 



20 

□ 

■.ri 

y s 

m 

4^ 



45 



50 



55 



60 



tattggggcc 


aagggacccc 


ggtcaccgtc 


tcctcagcct 


ccaccaaggg 


cccaucgguc 


c c A n 


ttccccctgg 


caccctcctc 


caagagcacc 


tctgggggca 


cagcggcccu 


gggc ugcc t.g 


c;'7 n 0 


gtcaaggact 


acttccccga 


accggtgacg 


gtgtcgtgga 


actcaggcgc 


cccgaccagc 


O / D u 


ggcgtgcaca 


ccttcccggc 


tgtcctacag 


tcct caggac 


c c u ac u c c c L- 


cagcagcgug 


Q 0 n 


gtgaccgtgc 


cctccagcag 


cttgggcacc 


cagacct aca 


uccgcaacgt 


/~«r ^ ^ ^ ^ -3 

ga.a ucacaag 


3 0 0 u 


cccagcaaca 


c c a. a. y g t, y g «. 


c a. a.g a.g a. g u L. 


^ 3 3 3 
gagcc(jcicia.c- 


t. Uy Cy cH-dd 


CLcLL' U^cl^cl^cl 


3 -? *± L/ 


tgcccaccgt 


gcccagcacc 


tgaactcctg 




cagcc t. uccu 


+— 4- /■< ^ *^ ^ 

cuucccccca 


n n r» 
0 u u u 


aaacccaagg 


acacccccac 


gaucccccgg 


a.c c c c L.ga.gg 


■^' o a ^ a t* i~f 1^ i~ 

t-ca.ca.t.ycyt. 


(^^^ /~T^+" r^r^Zi 

99t-ggi-ggac 


D U 0 U 


gtgagccacg 


aagaccctga 


ggtcaagttc 


aactggtacg 


tggacggcgt 


ggaggcgcau 


0 ±z u 


aatgccaaga 


caaagccgcg 




tacaacagca 


cgtaccgtgt 


ggtcagcgt c 


0 ± 0 u 


ctcaccgtcc 


tgcaccagga 


ctggctgaat 


ggcaaggagt 


acaagtgcaa 


ggtctccaac 


D Z U 


aaagccctcc 


cagcccccat 


cgagaaaacc 


at ct ccaaag 


c c aaagggc a 


gccccgagaa 


7 n n 

D ^ U U 


ccacaggtgt 


acaccctgcc 


cccatcccgg 


gaggagatga 


ccaagaacca 


ggucagccug 


Do 0 U 


acctgcctgg 


ucaaaggccu 


cuaucccagc 


gacaccgccg 


cggaguggga. 


ga.gca.a. uggg 




cagccggaga 


acaactacaa 


gaccacgcct 


cccgtgctgg 


actccgacgg 


^ ^ ^ ^ r*i ^ ^ ^< 




ctctatagca 


agctcaccgt 


ggacaagagc 


aggtggcagc 


aggggaacgt 


cttctcatgc 


bo4 U 


tccgtgatgc 


acgaggctct 


gcacaaccac 


tacacgcaga 


agagcctctc 


cctgtctccc 


c c A 


gggaaatgaa 


agccgaattc 


gcccctctcc 


ctcccccccc 


cctaacgt ta 


ctggccgaag 


b b 0 U 


ccgcttggaa 


taaggccggt 


gtgcgtttgt 


ctatatgtta 


ttttccacca 


tattgccgtc 


b / -ii U 


ttucggcaat: 


gtgagggccc 


ggaaacctgg 


ccctgtcttc 


ttgacgagca 


ttcctagggg 


b / 0 U 


tctttcccct 


ctcgccaaag 


gaatgcaagg 


tctgttgaat 


gtcgtgaagg 


aagcagttcc 


Q A A 
b o4 U 


tctggaagct 


tcttgaagac 


aaacaacgtc 


tgtagcgacc 


ctttgcaggc 


agcggaaccc 


c Q r» ri 
b y U U 


cccacctggc 


gacaggtgcc 


tctgcggcca 


aaagccacgt 


gtataagata 


cacctgcaaa 


c Q c n 
b y bU 


ggcggcacaa 


ccccagtgcc 


acgttgtgag 


ttggatagtt 


gtggaaagag 


tcaaatggct 


T n 0 n 


ctcctcaagc 


gtattcaaca 


aggggctgaa 


ggatgcccag 


aaggtacccc 


attgtatggg 


■-7 n Q A 


accugaucug 


gggcctcggt 


gcacacgcu u 


cacacgcgu u 


cagccgaggt 


uaaaaaaacg 




tctaggcccc 


ccgaaccacg 


gggacgtggt 


u t ucctt uga 


aaaacacgat 


gataatatgg 


T 0 n A 
/ z U U 


cctcctttgt 


ctctctgctc 


ctggtaggca 


tcctattcca 


tgccacccag 


gccgacatcc 


7260 


agctgaccca 


gagcccaagc 


agcctgagcg 


ccagcgtggg 


tgacagagtg 


accatcacct 


7320 


gtaaggccag 


tcaggatgtg 


ggtacttctg 


tagcctggta 


ccagcagaag 


ccaggtaagg 


7380 


ctccaaagct 


gctgatctac 


tggacatcca 


cccggcacac 


tggtgtgcca 


agcagattca 


7440 
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gcggtagcgg 


tagcggtacc 


gacttcacct 




tcgccaccta 


ctactgccag 


caatatagcc 




tggaaatcaa 


acgaactgtg 


gctgcaccat 




agttgaaatc 


tggaactgcc 


tctgttgtgt 


10 


ccaaagtaca 


gtggaaggtg 


gataacgccc 


cagagcagga 


cagcaaggac 


agcacctaca 




cagactacga 


gaaacacaaa 


gtctacgcct 


15 


ccgtcacaaa 


gagcttcaac 






atatcaagct 


tatcgataat 


caacctctgg 


20 


ttcttaacta 


tgttgctcct 


tttacgctat 


atgctattgc 


ttcccgtatg 


gctttcattt 




ctctttatga 


ggagttgtgg 


cccgttgtca 




ctgacgcaac 


ccccactggt 


tggggcattg 




tcgctttccc 


cctccctatt 


gccacggcgg 


o 

3(B 


ggacaggggc 


tcggctgttg 


ggcactgaca 


cctttccttg 


gctgctcgcc 


tgtgttgcca 




acgtcccttc 


ggccctcaat 


ccagcggacc 




ggcctcttcc 


gcgtcttcgc 


cttcgccctc 


ill 


ccccgcctga 


tcgataccgt 


caacatcgat 




aaagggggga 


atgaaagacc 


ccacctgtag 




ttgcaaggca 


tggaaaaata 


cataactgag 




agatggaaca 


gctgaatatg 


ggccaaacag 


45 


ctcagggcca 


agaacagatg 


gaacagctga 




cagttcctgc 


cccggctcag 


ggccaagaac 


50 


gcagtttcta 


gagaaccatc 


agatgtttcc 


tgccttattt 


gaactaacca 


atcagttcgc 




ccgagctcaa 


taaaagagcc 


cacaacccct 


55 


agtcgcccgg 


gtacccgtgt 


atccaataaa 




tcgctgttcc 


ttgggagggt 


ctcctctgag 


60 


att 

<210> 8 







tcaccatcag 


cagcctccag 


ccagaggaca 


7500 


tctatcggtc 


gttcggccaa 


gggaccaagg 


7560 


ctgtcttcat 


cttcccgcca 


tctgatgagc 


7620 


gcctgctgaa 


taacttctat 


cccagagagg 


T ^ Q A 

/b oU 


tccaatcggg 


taactcccag 


gagagtgtca 


/ /40 


gcctcagcag 


caccctgacg 


ctgagcaaag 


'1 o r\ n 

7 800 


gcgaagtcac 


ccatcagggc 


ctgagctcgc 


7 860 


gttagagatc 


ccccgggctg 


caggaattcg 


792 0 


attacaaaat 


ttgtgaaaga 


ttgactggta 


7980 


gtggatacgc 


tgctttaatg 


M n 4— 4- 4— A>4- A 4- M 

cctttgtatc 


8040 


tctcctcctt 


gtataaatcc 


4- ATA. 4- 4— ATM 4— ..vi- 

tggttgctgt 


O 1 A A 

8100 


ggcaacgtgg 


cgtggtgtgc 


actgtgtttg 


O T ^ A 


ccaccacctg 


tcagctcctt 


tccgggactt 


822 0 


aactcatcgc 


cgcctgcctt 


gcccgctgct 


8280 


attccgtggt 


gttgtcgggg 


aaatcatcgt 


8340 


cctggattct 


gcgcgggacg 


tccttctgct 


84 00 


ttccttcccg 


cggcctgctg 


ccggctctgc 


8460 


agacgagtcg 


gatctccctt 


tgggccgcct 


8520 


aaaataaaag 


attttattta 


gtctccagaa 


8580 


gtttggcaag 


ctagcttaag 


taacgccatt 


8640 


aatagagaag 


ttcagatcaa 


ggtcaggaac 


8700 


gatatctgtg 


gtaagcagtt 


cctgccccgg 


8760 


atatgggcca 


aacaggatat 


ctgtggtaag 


8820 


agatggtccc 


cagatgcggt 


ccagccctca 


O O O A 

8880 


agggtgcccc 


a-a99acctga 


aatgaccctg 


O A yi A 

8940 


ttctcgcttc 


tgttcgcgcg 


cttctgctcc 


9000 


cactcggggc 


gccagtcctc 


cgattgactg 


9060 


ccctcttgca 


gttgcatccg 


acttgtggtc 


9120 


tgattgacta 


cccgtcagcg 


ggggtctttc 


9180 
9183 
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<211> 5711 






<212> DNA 






5 


<213> Artificial Sequence 




<220> 






1 f\ 


<223> Synthetic 




<400> 8 
gatcagtcct 


gggtggtcat 


tgaaaggact 


15 


ccacctgatg 
cacrqaqqaqa 


cgaagaactg 
agggatgaca 


actcatgtga 
gaggatggaa 




atgagtttga 


gcaagcttcc 


^OT^gttggt 


on 


ccatqqggtt 


gcaaagagtt 


ggacactact 


n 


catggtacag 


aatataggat 


aaaaaagagg 


. ~=i 


ggatataaaa 


gtttagaata 


cctttagttt 


S 


tacccactgc 


aatataagaa 


atcaggcttt 




cataccagaa 


gctaacagct 


attggttata 




tggttatata 


gcatgaagct 


tgatgccagc 




ctaaactcta 


catgttccag 


gacactgatc 




aggctctagg 


tgtatattgt 


ggggcttccc 


! y 


caatgtgggt 


gatctgggtt 


cgatccctgg 




caacccactc 


tagtactctt 


acctggaaaa 




aqtccatqqq 


attgcaaaga 


gttgaacaca 




atacacctgt 


qaqqtqaaqt 


gaagtgaagg 


45 


gattctttac 


catctgagcc 


accagggaag 


cttctccagg 


ggatcttccc 


atcccaggaa 




attcttcacc 


agctgaacta 


ccaqqtqqat 




agttttccca 


cctttcccaa 


aaaggttggg 




ctgaggctgt 


ctacaagctt 


atatatttat 


55 


gatttacaat 


gtggtatctg 


gctatttagt 


tagcatctca 


gagggcagct 


agatactgtc 




gaaatagaaa 


gtctctggat 


ctaagttata 


60 


attctactcc 


tgaccactca 


acaaggaacc 




tgcctgggtt 


gagtgggcca 


tgacatatgt 



gatgctgaag 


ttgaagctcc 


aatactttgg 


60 


taagaccctg 


atactgggaa 


agattgaagg 


120 


gagttggatg 


gaatcaccaa 


ctcgatggac 


180 


aatgggcagg 


gaagcctggc 


gtgctgcagt 


240 


gagtgactga 


actgaactga 


tagtgtaatc 


300 


aagagtttgc 


cctgattctg 


aagagttgta 


360 


ggaagtctta 


aattatttac 


ttaggatggg 


420 


agagactgat 


gtagagagaa 


tgagccctgg 


480 


gctgttataa 


ccaatatata 


accaatatat 


540 


aatttgaagg 


aaccatttag 


aactagtatc 


600 


ttaaagctca 


ggttcagaat 


cttgttttat 


660 


tggtggctca 


gatggtaaag 


tgtctgcctg 


720 


cttgggaaga 


tcccctggag 


aaggaaatgg 


780 


ttccatggac 


agaggagcct 


tgtaagctac 


840 


actgagcaac 


taagcacagc 


acagtacagt 


900 


ttcaatgcag 


ggtctcctgc 


attgcagaaa 


960 


cccaagaata 


ctggagtggg 


tagcctattc 


1020 


ttgaactgga 


gtctcctgca 


tttcaggtgg 


1080 


actactccaa 


tattaaagtg 


cttaaagtcc 


1140 


tcactctttt 


ttaaccttct 


gtggcctact 


1200 


gaacacattt 


attgcaagtt 


gttagtttta 


1260 


ggtattggtg 


gttggggatg 


gggaggctga 


1320 


atacacactt 


ttcaagttct 


ccatttttgt 


1380 


tgtgattctc 


agtctctgtg 


gtcatattct 


1440 


aagatatcaa 


gggacacttg 


ttttgtttca 


1500 


tctgggcctt 


gttacatggc 


tggattggtt 


1560 
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ggacaagtgc 


cagctctgat 


cctgggactg 




cattctgcat 


gtctctaggg 


gggaaggggg 


J 


ctgattgcct 


cacttcttat 


attgccccca 




acagtgcttc 


ccagaaccaa 


ccctacaaga 


10 


gcaggatcat 


ggtttgaact 


ctttctggcc 


tgggagaggg 


aaaggaaaag 


tagggtgaat 




ctgtcttggc 


atgaccagtc 


tctcttcatt 




gcccctgagg 


ctttctgcat 


gaatataaat 




gttcttgggg 


gcgccgaatt 


cgagctcggt 


20 


actggcaggt 


gctgggggct 


tccgagacaa 




tcgaccaggg 


tgagatatcg 


gccggggacg 


O 

iS : 


tacttactgg 


caggtgctgg 


gggcttccga 




ccgcctcgac 


cagggtgaga 


tatcggccgg 




ctcgagaagc 


ttgttgggaa 


ttcaggccat 




ggtctttctc 


ttcttcctgt 


cagtaactac 


m 


gtctccagcc 


tccctatctg 


catctgtggg 


Q 


tgggaatatt 


cacaattatt 


tagcatggta 




cctggtctat 


aatgcaaaaa 


ccttagcaga 


s 


atcaggaaca 


caatattctc 


tcaagatcaa 




ttactgtcaa 


catttttgga 


gtactccgtg 




caaacgggct 


gatgctgcac 


caactgtatc 




atctggaggt 


gcctcagtcg 


tgtgcttctt 




caagtggaag 


attgatggca 


gtgaacgaca 




ggacagcaaa 


gacagcacct 


acagcatgag 


50 


tgaacgacat 


aacagctata 


cctgtgaggc 


caagagcttc 


aacaggaatg 


agtgttgaaa 




cctccccccc 


ccctaacgtt 


actggccgaa 


55 


tctatatgtt 


attttccacc 


atattgccgt 




gccctgtctt 


cttgacgagc 


attcctaggg 


60 


gtctgttgaa 


tgtcgtgaag 


gaagcagttc 


ctgtagcgac 


cctttgcagg 


cagcggaacc 



tggcatgtga 


tgacatacac 


cccctctcca 


1 ^ o n 


aagctcggta 


tagaaccttt 


auuguat ucc 


T Q n 
± D o u 


tgcccttctt 


tgttcctcaa 


gtaaccagag 


T n ACS 


aacaaagggc 


taaacaaagc 


caaaugggaa 


1 Q n ri 
X o u u 


agagaacaat 


acctgctatg 


gactagatac 


1 o b U 


tatggaagga 


agctggcagg 


ctcagcgttt 


1 Q o n 


ctcttcctag 


atgtagggct 


tggtaccaga 


T O O A 


atatgaaact 


gagtgatgct 


tccatttcag 


T A /I A 


acccggggat 


ctcgacggat 


ccgattactt 


O 1 A A 


tcgcgaacat 


ctacaccaca 


caacaccgcc 


o 1 c n 
z±o\j 


cggcggtggt 


aat tacaagc 


gagatccgat 


O O O A 


gacaatcgcg 


aacatctaca 


ccacacaaca 


O O Q A 


ggacgcggcg 


gtggtaatta 


caagcgagat 


o'i A r\ 
^ ^4 U 


cgatcccgcc 


gccaccatgg 


aatggagctg 


O /I A A 
^4 U U 


aggtgtccac 


tccgacatcc 


agatgaccca 


O ^ A 


agaaactgtc 


actatcacat 


gtcgagcaag 




tcagcagaaa 


cagggaaaat 


ctcctcagct 


O C O A 


tggtgtgcca 


tcaaggttca 


gtggcagtgg 


O £r /I A 


cagcctgcag 


cctgaagatt 


ttgggagtta 


O T A A 


gacgttcggt 


ggaggcacca 


agctggaaat 


O T ^ A 

^ /bU 


catcttccca 


ccatccagtg 


agcagttaac 


O O O A 

2 82 0 


gaacaacttc 


taccccaaag 


acatcaatgt 


O O O A 

2 880 


aaatggcgtc 


ctgaacagtt 


ggactgatca 


O QA n 

z y4 u 


cagcaccctc 


acattgacca 


aggacgagta 


O A A A 

U U vJ 


cactcacaag 


acatcaactt 


cacccattgt 


T A C A 

J U b U 


gcatcgattt 


cccctgaatt 


cgcccctctc 




gccgcttgga 


ataaggccgg 


tgtgcgtttg 


3180 


cttttggcaa 


tgtgagggcc 


cggaaacctg 


3240 


gtctttcccc 


tctcgccaaa 


ggaatgcaag 


3300 


ctctggaagc 


ttcttgaaga 


caaacaacgt 


3360 


ccccacctgg 


cgacaggtgc 


ctctgcggcc 


3420 
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aaaagccacg 
gttggatagt 
aggatgccca 
ttacatgtgt 
ttttcctttg 
atcctattcc 
aagccagggg 
tttatgcact 
cctgcgaatg 
gacacatcct 
gtctattact 
gtcactgtct 
gctgcccaaa 
ccagtgacag 
gtcctgcagt 
cccagcgaga 
aaaattgtgc 
taatctagag 
aatttgtgaa 
cgctgcttta 
cttgtataaa 
tggcgtggtg 
ctgtcagctc 
cgccgcctgc 
ggtgttgtcg 
tctgcgcggg 
ccgcggcctg 
tcggatctcc 
ctccagaaaa 
acgccatttt 
tcaggaacag 



tgtataagat 
tgtggaaaga 
gaaggtaccc 
ttagtcgagg 
aaaaacacga 
atgccaccca 
cctcagtcaa 
gggtgaagca 
ggaatactga 
ccaacacagt 
gtgctagtgg 
ctgcagccaa 
ctaactccat 
tgacctggaa 
ttgacctcta 
ccgtcacctg 
ccagggattg 
ttaagcggcc 
agattgactg 
atgcctttgt 
tcctggttgc 
tgcactgtgt 
ctttccggga 
cttgcccgct 
gggaaatcat 
acgtccttct 
ctgccggctc 
ctttgggccg 
aggggggaat 
gcaaggcatg 
atggaacagc 



acacctgcaa 
gtcaaatggc 
cattgtatgg 
ttaaaaaaac 
tgataatatg 
ggccgaggtt 
gttgtcctgc 
gaggcctgaa 
atatgacccg 
caacctgcag 
aggggaactg 
aacgacaccc 
ggtgaccctg 
ctctggatcc 
cactctgagc 
caacgttgcc 
tactagtgga 
gtcgagatct 
gtattcttaa 
atcatgctat 
tgtctcttta 
ttgctgacgc 
ctttcgcttt 
gctggacagg 
cgtcctttcc 
gctacgtccc 
tgcggcctct 
cctccccgcc 
gaaagacccc 
gaaaaataca 
tgaatatggg 



aggcggcaca 
tctcctcaag 
gatctgatct 
gtctaggccc 
gcctcctttg 
cagcttcagc 
acagcttctg 
cagggcctgg 
aagttccagg 
ctcagcagcc 
gggtttcctt 
ccatctgtct 
ggatgcctgg 
ctgtccagcg 
agctcagtga 
cacccggcca 
ggtggaggta 
cgacatcgat 
ctatgttgct 
tgcttcccgt 
tgaggagttg 
aacccccact 
ccccctccct 
ggctcggctg 
ttggctgctc 
ttcggccctc 
tccgcgtctt 
tgatcgataa 
acctgtaggt 
taactgagaa 
ccaaacagga 



accccagtgc 
cgtattcaac 
ggggcctcgg 
cccgaaccac 
tctctctgct 
agtctggggc 
gcttcaacat 
agtggattgg 
gcaaggccac 
tgacatctga 
actggggcca 
atccactggc 
tcaagggcta 
gtgtgcacac 
ctgtcccctc 
gcagcaccaa 
gccaccatca 
aatcaacctc 
ccttttacgc 
atggctttca 
tggcccgttg 
ggttggggca 
attgccacgg 
ttgggcactg 
gcctgtgttg 
aatccagcgg 
cgccttcgcc 
aataaaagat 
ttggcaagct 
tagagaagtt 
tatctgtggt 



cacgttgtga 
aaggggctga 
tgcacatgct 
ggggacgtgg 
Gctggtaggc 
agagcttgtg 
taaagacacc 
aaggattgat 
tataacagca 
ggacactgcc 
agggactctg 
ccctggatct 
tttccctgag 
cttcccagct 
cagcacctgg 
ggtggacaag 
ccatcaccat 
tggattacaa 
tatgtggata 
ttttctcctc 
tcaggcaacg 
ttgccaccac 
cggaactcat 
acaattccgt 
ccacctggat 
accttccttc 
ctcagacgag 
tttatttagt 
agcttaagta 
cagatcaagg 
aagcagttcc 



94 



tgccccggct cagggccaag aacagatgga acagctgaat atgggccaaa caggatatct 
gtggtaagca gttcctgccc cggctcaggg ccaagaacag atggtcccca gatgcggtcc 
agccctcagc agtttctaga gaaccatcag atgtttccag ggtgccccaa ggacctgaaa 
tgaccctgtg ccttatttga actaaccaat cagttcgctt ctcgcttctg ttcgcgcgct 
tctgctcccc gagctcaata aaagagccca caacccctca ctcggggcgc cagtcctccg 
attgactgag tcgcccgggt acccgtgtat ccaataaacc ctcttgcagt tgcatccgac 
ttgtggtctc gctgttcctt gggagggtct cctctgagtg attgactacc cgtcagcggg 
ggtctttcat t 
<210> 9 
<211> 5130 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 
<400> 9 



tttgaaagac 


cccacccgta 


ggtggcaagc 


tagcttaagt 


aacgccactt 


tgcaaggcat 


ggaaaaatac 


ataactgaga 


atagaaaagt 


tcagatcaag 


gtcaggaaca 


aagaaacagc 


tgaataccaa 


acaggatatc 


tgtggtaagc 


ggttcctgcc 


ccggctcagg 


gccaagaaca 


gatgagacag 


ctgagtgatg 


ggccaaacag 


gatatctgtg 


gtaagcagtt 


cctgccccgg 


ctcggggcca 


agaacagatg 


gtccccagat 


gcggtccagc 


cctcagcagt 


ttctagtgaa 


tcatcagatg 


tttccagggt 


gccccaagga 


cctgaaaatg 


accctgtacc 


ttatttgaac 


taaccaatca 


gttcgcttct 


cgcttctgtt 


cgcgcgcttc 


cgctctccga 


gctcaataaa 


agagcccaca 


acccctcact 


cggcgcgcca 


gtcttccgat 


agactgcgtc 


gcccgggtac 


ccgtattccc 


aataaagcct 


cttgctgttt 


gcatccgaat 


cgtggtctcg 


ctgttccttg 


ggagggtctc 


ctctgagtga 


ttgactaccc 


acgacggggg 


tctttcattt 


gggggctcgt 


ccgggatttg 


gagacccctg 


cccagggacc 


accgacccac 


caccgggagg 


taagctggcc 


agcaacttat 


ctgtgtctgt 


ccgattgtct 


agtgtctatg 


tttgatgtta 


tgcgcctgcg 


tctgtactag 


ttagctaact 


agctctgtat 


ctggcggacc 


cgtggtggaa 


ctgacgagtt 


ctgaacaccc 


ggccgcaacc 


ctgggagacg 


tcccagggac 


tttgggggcc 


gtttttgtgg 


cccgacctga 


ggaagggagt 


cgatgtggaa 


tccgaccccg 


tcaggatatg 


tggttctggt 


aggagacgag 


aacctaaaac 


agttcccgcc 


tccgtctgaa 


tttttgcttt 


cggtttggaa 


ccgaagccgc 


gcgtcttgtc 


tgctgcagcc 


aagcttgggc 


tgcaggtcga 


ggactgggga 
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ccctgcaccg 


aacatggaga 


acacaacatc 


aggattccta 


ggacccctgc 


tcgtgttaca 


1080 




ggcggggttt 


ttcttgttga 


caagaatcct 


cacaatacca 


cagagtctag 


actcgtggtg 


114 0 


r 
J 


gacttctctc 


aattttctag 


ggggagcacc 


cacgtgtcct 


ggccaaaatt 


cgcagtcccc 


12 00 




aacctccaat 


cactcaccaa 


cctcttgtcc 


tccaatttgt 


cctggctatc 


gctggatgtg 


1260 


10 


tctgcggcgt 


tttatcatat 


tcctcttcat 


cctgctgcta 


tgcctcatct 


tcttgttggt 


1320 


tcttctggac 


taccaaggta 


tgttgcccgt 


ttgtcctcta 


cttccaggaa 


catcaactac 


13 80 




cagcacggga 


ccatgcaaga 


cctgcacgat 


tcctgctcaa 


ggaacctcta 


tgtttccctc 


1440 


1 c 
I J 


ttgttgctgt 


acaaaacctt 


cggacggaaa 


ctgcacttgt 


attcccatcc 


catcatcctg 


1500 




ggctttcgca 


agattcctat 


gggagtgggc 


ctcagtccgt 


ttctcctggc 


tcagtttact 


1560 


20 


agtgccattt 


gttcagtggt 


tcgtagggct 


ttcccccact 


gtttggcttt 


cagttatatg 


1620 


gatgatgtgg 


tattgggggc 


caagtctgta 


caacatcttg 


agtccctttt 


tacctctatt 


1680 


i 


accaattttc 


ttttgtcttt 


gggtatacat 


ttaaacccta 


ataaaaccaa 


acgttggggc 


1740 




tactccctta 


acttcatggg 


atatgtaatt 


ggatgttggg 


gtactttacc 


gcaagaacat 


1800 


Q 


attgtactaa 


aaatcaagca 


atgttttcga 


aaactgcctg 


taaatagacc 


tattgattgg 


1860 




aaagtatgtc 


agagacttgt 


gggtcttttg 


ggctttgctg 


ccccttttac 


acaatgtggc 


1920 


tatcctgcct 


taatgccttt 


atatgcatgt 


atacaatcta 


agcaggcttt 


cactttctcg 


1980 


□ 


ccaacttaca 


aggcctttct 


gtgtaaacaa 


tatctgaacc 


tttaccccgt 


tgcccggcaa 


2040 




cggtcaggtc 


tctgccaagt 


gtttgctgac 


gcaaccccca 


ctggatgggg 


cttggctatc 


2100 




ggccatagcc 


gcatgcgcgg 


acctttgtgg 


ctcctctgcc 


gatccatact 


gcggaactcc 


2160 




tagcagcttg 


ttttgctcgc 


aggcggtctg 


gagcgaaact 


tatcggcacc 


gacaactctg 


2220 


ttgtcctctc 


tcggaaatac 


acctcctttc 


catggctgct 


agggtgtgct 


gccaactgga 


2280 




tcccctcagg 


atatagtagt 


ttcgcttttg 


catagggagg 


gggaaatgta 


gtcttatgca 


2340 




atacacttgt 


agtcttgcaa 


catggtaacg 


atgagttagc 


aacatgcctt 


acaaggagag 


2400 




aaaaagcacc 


gtgcatgccg 


attggtggaa 


gtaaggtggt 


acgatcgtgc 


cttattagga 


2460 


50 


aggcaacaga 


caggtctgac 


^tggattgga 


cgaaccactg 


aattccgcat 


tgcagagata 


2520 


attgtattta 


agtgcctagc 


tcgatacagc 


aaacgccatt 


tttgaccatt 


caccacattg 


2580 




gtgtgcacct 


tccaaagctt 


cacgctgccg 


caagcactca 


gggcgcaagg 


gctgctaaag 


2640 


55 


gaagcggaac 


acgtagaaag 


ccagtccgca 


gaaacggtgc 


tgaccccgga 


tgaatgtcag 


2700 




ctactgggct 


atctggacaa 


gggaaaacgc 


aagcgcaaag 


agaaagcagg 


tagcttgcag 


2760 


60 


tgggcttaca 


tggcgatagc 


tagactgggc 


ggttttatgg 


acagcaagcg 


aaccggaatt 


2820 


gccagctggg 


gcgccctctg 


gtaaggttgg 


gaagccctgc 


aaagtaaact 


ggatggcttt 


2880 



96 




97 
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tgggccaaac aggatatctg tggtaagcag ttcctgcccc ggctcagggc caagaacaga 4800 
tggtccccag atgcggtcca gccctcagca gtttctagag aaccatcaga tgtttccagg 4860 
gtgccccaag gacctgaaat gaccctgtgc cttatttgaa ctaaccaatc agttcgcttc 4920 
tcgcttctgt tcgcgcgctt ctgctccccg agctcaataa aagagcccac aacccctcac 4980 
tcggggcgcc agtcctccga ttgactgagt cgcccgggta cccgtgtatc caataaaccc 5040 
tcttgcagtt gcatccgact tgtggtctcg ctgttccttg ggagggtctc ctctgagtga 5100 
ttgactaccc gtcagcgggg gtctttcatt 5130 
15 <210> 10 

<211> 4661 
<212> DNA 

20 

<213> Artificial Sequence 

Q 

■,fi <220> 
25^ <223> Synthetic 
%J <400> 10 





gatcagtcct 


gqgtggtcat 


tgaaaggact 


gatgctgaag 


ttgaagctcc 


aatactttgg 


60 




ccacctgatg 


cgaagaactg 


actcatgtga 


taagaccctg 


atactgggaa 


agattgaagg 


120 


11 

3^ 


caggaggaga 


agggatgaca 


gaggatggaa 


gagttggatg 


gaatcaccaa 


ctcgatggac 


180 


atgagtttga 


gcaagcttcc 


aggagttggt 


aatgggcagg 


gaagcctggc 


gtgctgcagt 


240 


ry 


ccatggggtt 


gcaaagagtt 


ggacactact 


gagtgactga 


actgaactga 


tagtgtaatc 


300 


catggtacag 


aatataggat 


aaaaaagagg 


aagagtttgc 


cctgattctg 


aagagttgta 


360 




ggatataaaa 


gtttagaata 


cctttagttt 


ggaagtctta 


aattatttac 


ttaggatggg 


420 




tacccactgc 


aatataagaa 


atcaggcttt 


agagactgat 


gtagagagaa 


tgagccctgg 


480 


45 


cataccagaa 


gctaacagct 


attggttata 


gctgttataa 


ccaatatata 


accaatatat 


540 


tggttatata 


gcatgaagct 


tgatgccagc 


aatttgaagg 


aaccatttag 


aactagtatc 


600 




ctaaactcta 


catgttccag 


gacactgatc 


ttaaagctca 


ggttcagaat 


cttgttttat 


660 


50 


aggctctagg 


tgtatattgt 


ggggcttccc 


tggtggctca 


gatggtaaag 


tgtctgcctg 


720 




caatgtgggt 


gatctgggtt 


cgatccctgg 


cttgggaaga 


tcccctggag 


aaggaaatgg 


780 


55 


caacccactc 


tagtactctt 


acctggaaaa 


ttccatggac 


agaggagcct 


tgtaagctac 


840 


agtccatggg 


attgcaaaga 


gttgaacaca 


actgagcaac 


taagcacagc 


acagtacagt 


900 




atacacctgt 


gaggtgaagt 


gaagtgaagg 


ttcaatgcag 


ggtctcctgc 


attgcagaaa 


960 


60 


gattctttac 


catctgagcc 


accagggaag 


cccaagaata 


ctggagtggg 


tagcctattc 


1020 




cttctccagg 


ggatcttccc 


atcccaggaa 


ttgaactgga 


gtctcctgca 


tttcaggtgg 


1080 



98 




99 





cctgaactcc 


tggggggacc 


gtcagtcttc 




atgatctccc 


ggacccctga 


ggtcacatgc 


5 


gaggtcaagt 


tcaactggta 


cgtggacggc 




cgggaggagc 


agtacaacag 


cacgtaccgt 


10 


gactggctga 


atggcaagga 


gtacaagtgc 


atcgagaaaa 


ccatctccaa 


agccaaaggg 




cccccatccc 


gggatgagct 


gaccaagaac 


15 


ttctatccca 


gcgacatcgc 


cgtggagtgg 




aagaccacgc 


ctcccgtgct 


ggactccgac 


20 


gtggacaaga 


gcaggtggca 


gcaggggaac 


ctgcacaacc 


actacacgca 


gaagagcctc 




ggaggtggcg 


cacctacttc 


aagttctaca 


25=^ 


ctgctggatt 


tacagatgat 


tttgaatgga 




aggatgctca 


catttaagtt 


ttacatgccc 


3© 


tgtctagaag 


aagaactcaa 


acctctggag 




tttcacttaa 


gacccaggga 


cttaatcagc 




ggatctgaaa 


caacattcat 


gtgtgaatat 




ctgaacagat 


ggattacctt 


ttgtcaaagc 


m 










acatcgataa 


aataaaagat 


tttatttagt 




acctgtaggt 


ttggcaagct 


agcttaagta 




taactgagaa 


tagagaagtt 


cagatcaagg 




ccaaacagga 


tatctgtggt 


aagcagttcc 


45 


acagctgaat 


atgggccaaa 


caggatatct 




ccaagaacag 


atggtcccca 


gatgcggtcc 


50 


atgtttccag 


ggtgccccaa 


ggacctgaaa 


cagttcgctt 


ctcgcttctg 


ttcgcgcgct 








\^ ciy L. ^ L. ^ y 


55 


ccaataaacc 


ctcttgcagt 


tgcatccgac 




cctctgagtg 


attgactacc 


cgtcagcggg 




<210> 11 






60 










<211> 5691 





ctcttccccc 


caaaacccaa 


ggacaccctc 


3000 


gtggtggtgg 


acgtgagcca 


cgaagaccct 


3060 


gtggaggtgc 


ataatgccaa 


gacaaagccg 


3120 


gtggtcagcg 


tcctcaccgt 


cctgcaccag 


3180 


aaggtctcca 


acaaagccct 


cccagccccc 


3240 


cagccccgag 


aaccacaggt 


gtacaccctg 


3300 


caggtcagcc 


tgacctgcct 


ggtcaaaggc 


3360 


gagagcaatg 


ggcagccgga 


gaacaactac 


3420 


ggctccttct 


tcctctacag 


caagctcacc 


3480 


gtcttctcat 


gctccgtgat 


gcatgaggct 


3540 


tccctgtctc 


cgggtaaagg 


aggcggatca 


3600 


aagaaaacac 


agctacaact 


ggagcattta 


3660 


attaataatt 


acaagaatcc 


caaactcacc 


3720 


aagaaggcca 


cagaactgaa 


acatcttcag 


3780 


gaagtgctaa 


atttagctca 


aagcaaaaac 


3840 


aatatcaacg 


taatagttct 


ggaactaaag 


3900 


gctgatgaga 


cagcaaccat 


tgtagaattt 


3960 


atcatctcaa 


cactaacttg 


aagcttgtta 


4020 


ctccagaaaa 


aggggggaat 


gaaagacccc 


4080 


acgccatttt 


gcaaggcatg 


gaaaaataca 


4140 


tcaggaacag 


atggaacagc 


tgaatatggg 


4200 


tgccccggct 


cagggccaag 


aacagatgga 


4260 


gtggtaagca 


gttcctgccc 


cggctcaggg 


4320 


agccctcagc 


agtttctaga 


gaaccatcag 


4380 


tgaccctgtg 


ccttatttga 


actaaccaat 


4440 


tctgctcccc 


gagctcaata 


aaagagccca 


4500 


attgactgag 


tcgcccgggt 


acccgtgtat 


4560 


ttgtggtctc 


gctgttcctt 


gggagggtct 


4620 


ggtctttcat 


t 




4661 



100 



<212> DNA 

<213> Artificial Sequence 
5 <220> 

<223> Synthetic 



<400> 11 



10 


gatcagtcct 


gggtggtcat 


tgaaaggact 


gatgctgaag 


ttgaagctcc 


aatactttgg 


60 




ccacctgatg 


cgaagaactg 


actcatgtga 


taagaccctg 


atactgggaa 


agattgaagg 


120 


1 s 


caggaggaga 


agggatgaca 


gaggatggaa 


gagttggatg 


gaatcaccaa 


ctcgatggac 


180 


atgagtttga 


gcaagcttcc 


aggagttggt 


aatgggcagg 


gaagcctggc 


gtgctgcagt 


240 




ccatggggtt 


gcaaagagtt 


ggacactact 


gagtgactga 


actgaactga 


tagtgtaatc 


300 


20 


catggtacag 


aatataggat 


aaaaaagagg 


aagagtttgc 


cctgattctg 


aagagttgta 


360 




ggatataaaa 


gtttagaata 


cctttagttt 


ggaagtctta 


aattatttac 


ttaggatggg 


420 


i.fl 


tacccactgc 


aatataagaa 


atcaggcttt 


agagactgat 


gtagagagaa 


tgagccctgg 


480 




cataccagaa 


gctaacagct 


attggttata 


gctgttataa 


ccaatatata 


accaatatat 


540 




tggttatata 


gcatgaagct 


tgatgccagc 


aatttgaagg 


aaccatttag 


aactagtatc 


600 


3© 


ctaaactcta 


catgttccag 


gacactgatc 


ttaaagctca 


ggttcagaat 


cttgttttat 


660 


yl 


^ggctctagg 


tgtatattgt 


ggggcttccc 


tggtggctca 


gatggtaaag 


tgtctgcctg 


720 


Jill E 

m 


caatgtgggt 


gatctgggtt 


cgatccctgg 


cttgggaaga 


tcccctggag 


aaggaaatgg 


780 


caacccactc 


tagtactctt 


acctggaaaa 


ttccatggac 


agaggagcct 


tgtaagctac 


840 




agtccatggg 


attgcaaaga 


gttgaacaca 


actgagcaac 


taagcacagc 


acagtacagt 


900 




atacacctgt 


gaggtgaagt 


gaagtgaagg 


ttcaatgcag 


ggtctcctgc 


attgcagaaa 


960 




gattctttac 


catctgagcc 


accagggaag 


cccaagaata 


ctggagtggg 


tagcctattc 


1020 




cttctccagg 


ggatcttccc 


atcccaggaa 


ttgaactgga 


gtctcctgca 


tttcaggtgg 


1080 


attcttcacc 


agctgaacta 


ccaggtggat 


actactccaa 


tattaaagtg 


cttaaagtcc 


1140 




agttttccca 


cctttcccaa 


aaaggttggg 


tcactctttt 


ttaaccttct 


gtggcctact 


1200 


50 


ctgaggctgt 


ctacaagctt 


atatatttat 


gaacacattt 


attgcaagtt 


gttagtttta 


1260 




gatttacaat 


gtggtatctg 


gctatttagt 


ggtattggtg 


gttggggatg 


gggaggctga 


1320 


55 


tagcatctca 


gagggcagct 


agatactgtc 


atacacactt 


ttcaagttct 


ccatttttgt 


1380 


gaaatagaaa 


gtctctggat 


ctaagttata 


tgtgattctc 


agtctctgtg 


gtcatattct 


1440 




attctactcc 


tgaccactca 


acaaggaacc 


aagatatcaa 


gggacacttg 


ttttgtttca 


1500 


60 


tgcctgggtt 


gagtgggcca 


tgacatatgt 


tctgggcctt 


gttacatggc 


tggattggtt 


1560 




ggacaagtgc 


cagctctgat 


cctgggactg 


tggcatgtga 


tgacatacac 


cccctctcca 


1620 



101 



cattctgcat 
ctgattgcct 
acagtgcttc 
gcaggatcat 
tgggagaggg 
ctgtcttggc 
gcccctgagg 
gttcttgggg 
actggcaggt 
tcgaccaggg 
tacttactgg 
ccgcctcgac 
ctcgagttaa 
ccatgatgtc 
aggtccaact 
cctgcaaggc 
ctggacaggg 
atgagaagtt 
tgcacctcaa 
atgatcttga 
ccccatctgt 
tgggatgcct 
ccctgtccag 
gcagctcagt 
cccacccggc 
gaggtggagg 
ccccccccta 
atgttatttt 
gtcttcttga 
ttgaatgtcg 
gcgacccttt 



gtctctaggg 
cacttcttat 
ccagaaccaa 
ggtttgaact 
aaaggaaaag 
atgaccagtc 
ctttctgcat 
gcgccgaatt 
gctgggggct 
tgagatatcg 
caggtgctgg 
cagggtgaga 
cagatctagg 
ctttgtctct 
gcagcagtct 
ttctggctac 
acttgagtgg 
caagggcaag 
cagcctgacc 
ctactggggc 
ctatccactg 
ggtcaagggc 
cggtgtgcac 
gactgtcccc 
cagcagcacc 
tagctaaggg 
acgttactgg 
ccaccatatt 
cgagcattcc 
tgaaggaagc 
gcaggcagcg 



gggaaggggg 
attgccccca 
ccctacaaga 
ctttctggcc 
tagggtgaat 
tctcttcatt 
gaatataaat 
cgagctcggt 
tccgagacaa 
gccggggacg 
gggcttccga 
tatcggccgg 
cctcctaggt 
ctgctcctgg 
gggcctgagc 
accttcacaa 
attgcatgga 
gccacactga 
tctgaggact 
caaggcacca 
gcccctggat 
tatttccctg 
accttcccag 
tccagcacct 
aaggtggaca 
agatctcgac 
ccgaagccgc 
gccgtctttt 
taggggtctt 
agttcctctg 
gaacccccca 



aagctcggta 
tgcccttctt 
aacaaagggc 
agagaacaat 
tatggaagga 
ctcttcctag 
atatgaaact 
acccggggat 
tcgcgaacat 
cggcggtggt 
gacaatcgcg 
ggacgcggcg 
cgacggatcc 
taggcatcct 
tggtgaagcc 
gctactattt 
tttatcctgg 
ctgcagacaa 
ctgcggtcta 
ctctcacagt 
ctgctgccca 
agccagtgac 
ctgtcctgca 
ggcccagcga 
agaaaattgt 
ggatccccgg 
ttggaataag 
ggcaatgtga 
tcccctctcg 
gaagcttctt 
cctggcgaca 



tagaaccttt 
tgttcctcaa 
taaacaaagc 
acctgctatg 
agctggcagg 
atgtagggct 
gagtgatgct 
ctcgacggat 
ctacaccaca 
aattacaagc 
aacatctaca 
gtggtaatta 
ccgggaattc 
attccatgcc 
tgggacttca 
acactgggtg 
aaatgttatt 
atcctccagc 
tttctgtgca 
ctcctcagcc 
aactaactcc 
agtgacctgg 
gtctgacctc 
gaccgtcacc 
gcccagggat 
gaattcgccc 
gccggtgtgc 
gggcccggaa 
ccaaaggaat 
gaagacaaac 
ggtgcctctg 



attgtatttt 
gtaaccagag 
caaatgggaa 
gactagatac 
ctcagcgttt 
tggtaccaga 
tccatttcag 
ccgattactt 
caacaccgcc 
gagatccgat 
ccacacaaca 
caagcgagat 
ggcgccgcca 
acccaggccc 
gtgaggatat 
aagcagaggc 
actacgtaca 
acagcctaca 
aggggtgacc 
aaaacgacac 
atggtgaccc 
aactctggat 
tacactctga 
tgcaacgttg 
tgtactagtg 
ctctccctcc 
gtttgtctat 
acctggccct 
gcaaggtctg 
aacgtctgta 
cggccaaaag 



102 





ccacgtgtat 


aagatacacc 


tgcaaaggcg 


gcacaacccc 


agtgccacgt 


tgtgagttgg 


3540 




atagttgtgg 


aaagagtcaa 


atggctctcc 


tcaagcgtat 


tcaacaaggg 


gctgaaggat 


•3 ^ n A 


J 


gcccagaagg 


taccccattg 


tatgggatct 


gatctggggc 


ctcggtgcac 


atgctt taca 


•3 ^ ^ n 




tgtgtttagt 


cgaggttaaa 


aaaacgtcta 


ggccccccga 


accacgggga 


cgtggttttc 


o T o r\ 

3720 


10 


ctttgaaaaa 


cacgatgata 


atatggcctc 


ctttgtctct 


ctgctcctgg 


taggcatcct 


3780 


attccatgcc 


acccaggccg 


acattgtgct 


gacacaatct 


ccagcaatca 


tgtctgcatc 


3840 




tccaggggag 


aaggtcacca 


tgacctgcag 


tgccacctca 


agtgcaagtt: 


acatacactg 


•3 Q A A 

3 y 00 


1 K 


gtaccagcag 


aagtcaggca 


cctcccccaa 


aagatggatt 


tatgacacat 


ccaaactggc 


T Q ^ A 

3 y oo 




ttctggagtc 


cctgctcgct 


tcagtggcag 


tgggtctggg 


acctctcact 


ctctcacact 


/I A O A 

4020 


20 


cagcagcatg 


gaggctgaag 


atgctgccac 


ttattactgc 


cagcagtggg 


gtagttacct 


/I A O A 

40o0 


cacgttcggt 


gcggggacca 


agctggagct 


gaaacgggct 


gatgctgcac 


caactgtatc 


4140 




catcttccca 


ccatccagtg 


agcagttaac 


atctggaggt 


gcctcagtcg 


tgtgcttctt 


42 00 


\M 


gaacaacttc 


taccccaaag 


acatcaatgt 


caagtggaag 


attgatggca 


gtgaacgaca 


4260 




aaatggcgtc 


ctgaacagtt 


ggactgatca 


ggacagcaaa 


gacagcacct 


acagcatgag 


4320 


%i 


cagcaccctc 


acgttgacca 


agga^cgagta 


tgaacgacat 


aacagctata 


cctgtgaggc 


A r> o r\ 
43 80 


ICS? 

iji 


cactcacaag 


acatcaactt 


cacccattgt 


caagagcttc 


aacaggaatg 


agtgttaata 


4440 




ggggagatct 


cgacatcgat 


aatcaacctc 


tggattacaa 


aatttgtgaa 


agattgactg 


4500 


gtattcttaa 


ctatgttgct 


ccttttacgc 


tatgtggata 


cgctgcttta 


atgcctttgt 


4560 




atcatgctat 


tgcttcccgt 


atggctttca 


ttttctcctc 


cttgtataaa 


tcctggttgc 


4620 




tgtctcttta 


tgaggagttg 


tggcccgttg 


tcaggcaacg 


tggcgtggtg 


tgcactgtgt 


4680 




















ttgctgacgc 


aacccccact 


ggttggggca 


ttgccaccac 


ctgtcagctc 


ctttccggga 


4740 




ctttcgcttt 


ccccctccct 


attgccacgg 


cggaactcat 


cgccgcctgc 


cttgcccgct 


4800 




gctggacagg 


ggctcggctg 


ttgggcactg 


acaattccgt 


ggtgttgtcg 


gggaaatcat 


4860 




cgtcctttcc 


ttggctgctc 


gcctgtgttg 


ccacctggat 


tctgcgcggg 


acgtccttct 


4920 


50 


gctacgtccc 


ttcggccctc 


aatccagcgg 


accttccttc 


ccgcggcctg 


ctgccggctc 


4980 


tgcggcctct 


tccgcgtctt 


cgccttcgcc 


ctcagacgag 


tcggatctcc 


ctttgggccg 


5040 




cctccccgcc 


tgatcgataa 


aataaaagat 


tttatttagt 


ctccagaaaa 


aggggggaat 


5100 


55 


gaaagacccc 


acctgtaggt 


ttggcaagct 


agcttaagta 


acgccatttt 


gcaaggcatg 


5160 




gaaaaataca 


taactgagaa 


tagagaagtt 


cagatcaagg 


tcaggaacag 


atggaacagc 


5220 


60 


tgaatatggg 


ccaaacagga 


tatctgtggt 


aagcagttcc 


tgccccggct 


cagggccaag 


5280 


aacagatgga 


acagctgaat 


atgggccaaa 


caggatatct 


gtggtaagca 


gttcctgccc 


5340 
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10 



cggctcaggg ccaagaacag atggtcccca gatgcggtcc agccctcagc agtttctaga 5400 
gaaccatcag atgtttccag ggtgccccaa ggacctgaaa tgaccctgtg ccttatttga 5460 
actaaccaat cagttcgctt ctcgcttctg ttcgcgcgct tctgctcccc gagctcaata 5520 
aaagagccca caacccctca ctcggggcgc cagtcctccg attgactgag tcgcccgggt 5580 
acccgtgtat ccaataaacc ctcttgcagt tgcatccgac ttgtggtctc gctgttcctt 564 0 
gggagggtct cctctgagtg attgactacc cgtcagcggg ggtctttcat t 5691 
<210> 12 
15 <211> 668 

<212> DNA 

<213> Artificial Sequence 

20 

<220> 

O <223> Synthetic 
2S <400> 12 

TO 
SI 

■5 



ggaattcgcc 


cctctccctc 


ccccccccct 


aacgttactg 


gccgaagccg 


cttggaataa 


60 


ggccggtgtg 


cgtttgtcta 


tatgttattt 


tccaccatat 


tgccgtcttt 


tggcaatgtg 


120 


agggcccgga 


aacctggccc 


tgtcttcttg 


acgagcattc 


ctaggggtct 


ttcccctctc 


180 


gccaaaggaa 


tgcaaggtct 


gttgaatgtc 


gtgaaggaag 


cagttcctct 


ggaagcttct 


240 


tgaagacaaa 


caacgtctgt 


agcgaccctt 


tgcaggcagc 


ggaacccccc 


acctggcgac 


300 


aggtgcctct 


gcggccaaaa 


gccacgtgta 


taagatacac 


ctgcaaaggc 


ggcacaaccc 


360 


cagtgccacg 


ttgtgagttg 


gatagttgtg 


gaaagagtca 


aatggctctc 


ctcaagcgta 


420 


ttcaacaagg 


ggctgaagga 


tgcccagaag 


gtaccccatt 


gtatgggatc 


tgatctgggg 


480 


cctcggtgca 


catgctttac 


atgtgtttag 


tcgaggttaa 


aaaaacgtct 


aggccccccg 


540 


aaccacgggg 


acgtggtttt 


cctttgaaaa 


acacgatgat 


aatatggcct 


tgctcatcct 


600 


tacctgtctt 


gtggctgttg 


ctcttgccgg 


cgccatggga 


tatctagatc 


tcgagctcgc 


660 


gaaagctt 












668 



4^ 

?r=Si 



45 



50 

<210> 13 
<211> 6255 
55 <212> DNA 

<213> Artificial Sequence 
<220> 

60 

<223> Synthetic 
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<400> 13 
tttgaaagac 


cccacccgta 


ggtggcaagc 


5 


ggaaaaatac 


ataactgaga 


atagaaaagt 


tgaataccaa 


acaggatatc 


tgtggtaagc 




gatgagacag 


ctgagtgatg 


ggccaaacag 


1 n 


ctcggggcca 


agaacagatg 


gtccccagat 




tcatcagatg 


tttccagggt 


gccccaagga 


15 


taaccaatca 


gttcgcttct 


cgcttctgtt 


agagcccaca 


acccctcact 


cggcgcgcca 




ccgtattccc 


aataaagcct 


cttgctgttt 


OA 


ggagggtctc 


ctctgagtga 


ttgactaccc 


3 


ccgggatttg 


gagacccctg 


cccagggacc 


agcaacttat 


ctgtgtctgt 


ccgattgtct 


tctgtactag 


ttagctaact 


agctctgtat 




ctgaacaccc 


ggccgcaacc 


ctgggagacg 


; c 


cccgacctga 


ggaagggagt 


cgatgtggaa 


Z S tt 

U 


^99^9^cgag 


aacctaaaac 


agttcccgcc 




ccgaagccgc 


gcgtcttgtc 


tgctgcagcg 


gactgtgttt 


ctgtatttgt 


ctgaaaatta 




gaccttaggt 


cactggaaag 


atgtcgagcg 




gaagagacgt 


tgggttacct 


tctgctctgc 




gcgagacggc 


acctttaacc 


gagacctcat 


45 


tggcccgcat 


ggacacccag 


accaggtccc 


tgacccccct 


ccctgggtca 


agccctttgt 




atccgccccg 


tctctccccc 


ttgaacctcc 




tccagccctc 


actccttctc 


taggcgccgg 




gatcgtttcg 


catgattgaa 


caagatggat 


55 


agaggctatt 


cggctatgac 


tgggcacaac 


tccggctgtc 


agcgcagggg 


cgcccggttc 




tgaatgaact 


gcaggacgag 


gcagcgcggc 


60 


gcgcagctgt 


gctcgacgtt 


gtcactgaag 




tgccggggca 


ggatctcctg 


tcatctcacc 



tagcttaagt 


aacgccactt 


tgcaaggcat 


60 


tcagatcaag 


gtcaggaaca 


aagaaacagc 


120 


ggttcctgcc 


ccggctcagg 


gccaagaaca 


180 


gatatctgtg 


gtaagcagtt 


cctgccccgg 


240 


gcggtccagc 


cctcagcagt 


ttctagtgaa 


300 


cctgaaaatg 


accctgtacc 


ttatttgaac 


360 


cgcgcgcttc 


cgctctccga 


gctcaataaa 


420 


gtcttccgat 


agactgcgtc 


gcccgggtac 


480 


gcatccgaat 


cgtggtctcg 


ctgttccttg 


540 


acgacggggg 


tctttcattt 


gggggctcgt 


600 


accgacccac 


caccgggagg 


taagctggcc 


660 


agtgtctatg 


tttgatgtta 


tgcgcctgcg 


720 


ctggcggacc 


cgtggtggaa 


ctgacgagtt 


780 


tcccagggac 


tttgggggcc 


gtttttgtgg 


840 


tccgaccccg 


tcaggatatg 


tggttctggt 


900 


tccgtctgaa 


tttttgcttt 


cggtttggaa 


960 


ctgcagcatc 


gttctgtgtt 


gtctctgtct 


1020 


gggccagact 


gttaccactc 


ccttaagttt 


1080 


gatcgctcac 


aaccagtcgg 


tagatgtcaa 


1140 


agaatggcca 


acctttaacg 


tcggatggcc 


1200 


cacccaggtt 


aagatcaagg 


tcttttcacc 


1260 


ctacatcgtg 


acctgggaag 


ccttggcttt 


1320 


acaccctaag 


cctccgcctc 


ctcttcctcc 


1380 


tcgttcgacc 


ccgcctcgat 


cctcccttta 


1440 


aattccgatc 


tgatcaagag 


acaggatgag 


1500 


tgcacgcagg 


ttctccggcc 


gcttgggtgg 


1560 


agacaatcgg 


ctgctctgat 


gccgccgtgt 


1620 


tttttgtcaa 


gaccgacctg 


tccggtgccc 


1680 


tatcgtggct 


ggccacgacg 


ggcgttcctt 


1740 


cgggaaggga 


ctggctgcta 


ttgggcgaag 


1800 


ttgctcctgc 


cgagaaagta 


tccatcatgg 


1860 
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ctgatgcaat 


gcggcggctg 


catacgcttg 


atccggctac 


ctgcccattc 


gaccaccaag 






cgaaacatcg 


catcgagcga 


gcacgtactc 


ggatggaagc 


cggtcttgtc 


gatcaggatg 


T Q o n 


r 
J 


atctggacga 


agagcatcag 


gggctcgcgc 


cagccgaact 


gttcgccagg 


ctcaaggcgc 


z U4 U 




gcatgcccga 


cggcgaggat 


ctcgtcgtga 


cccatggcga 


tgcctgcttg 


ccgaatatca 


^ lUU 


10 


tggtggaaaa 


tggccgcttt 


i— n 4- m A. « 4- i_ A A 

tctggattca 


tcgactgtgg 


ccggctgggt 


gtggcggacc 


^ loU 


gctatcagga 


catagcgttg 


gctacccgtg 


atattgctga 


agagcttggc 


ggcgaatggg 


o o o n 




ctgaccgctt 


cctcgtgctt 


tacggtatcg 


ccgctcccga 


ttcgcagcgc 


atcgccttct 


1 O A 

z z o U 


1 c 

i J 


atcgccttct 


tgacgagttc 


ttctgagcgg 


gactctgggg 


ttcgaaatga 


ccgaccaagc 


z o4 U 




gacgcccaac 


ctgccatcac 


gagatttcga 


ttccaccgcc 


gccttctatg 


aaaggttggg 


O /I A A 

2400 


20 


cttcggaatc 


gttttccggg 


acgccggctg 


gatgatcctc 


cagcgcgggg 


atctcatgct 


246 0 


ggagttcttc 


gcccaccccg 


ggctcgatcc 


cctcgcgagt 


tggttcagct 


gctgcctgag 


O C O A 

252 0 




gctggacgac 


ctcgcggagt 


tctaccggca 


gtgcaaatcc 


gtcggcatcc 


aggaaaccag 


O C Q A 
Z DO 0 




cagcggctat 


ccgcgcatcc 


atgcccccga 


actgcaggag 


tggggaggca 


cgatggccgc 


264 0 




tttggtcgag 


gcggatccgg 


ccattagcca 


cactacccac 


cgguEacaiia 


gcataaatca 


O "7 n A 

2/00 




atattggcta 


ttggccattg 


catacgttgt 


atccatatca 


taacatgtac 


atttatattg 


2 /60 


y 5 


gctcatgtcc 


aacattaccg 


ccatgttgac 


attgattatt 


gactagttat 


taatagtaat 


2 82 0 




caattacggg 


gtcattagtt 


catagcccat 


atatggagtt 


ccgcgttaca 


taacttacgg 


2880 




taaatggccc 


gcctggctga 


ccgcccaacg 


acccccgccc 


attgacgtca 


ataatgacgt 


2 940 


fii 

2 -ST 


atgttcccat 


agtaacgcca 


atagggactt 


tccattgacg 


tcaatgggtg 


gagtatttac 


3000 


PI 


ggtaaactgc 


ccacttggca 


gtacatcaag 


4- w4- — > 4- n -> 4- -t 4- 

tgtatcatat 


gccaagtacg 


ccccctattg 


O A ^ A 

3 060 


















acgtcaatga 


cggtaaatgg 


cccgcctggc 


attatgccca 


gtacatgacc 


ttatgggact 


3120 




ttcctacttg 


gcagtacatc 


tacgtattag 


tcatcgctat 


taccatggtg 


atgcggtttt 


3180 




ggcagtacat 


caatgggcgt 


ggatagcggt 


ttgactcacg 


gggatttcca 


agtctccacc 


3240 




ccattgacgt 


caatgggagt 


ttgttttggc 


accaaaatca 


acgggacttt 


ccaaaatgtc 


T T A A 

3300 


50 


gtaacaactc 


cgccccattg 


acgcaaatgg 


gcggtaggca 


tgtacggtgg 


gaggtctata 


3360 


taagcagagc 


tcgtttagtg 


aaccgtcaga 


tcgcctggag 


acgccatcca 


cgcugctctg 


342 0 




acctccatag 


aagacaccgg 


gaccgatcca 


gcctccgcgg 


ccccaagctt 


ctcgacggat 


3480 


55 


ccccgggaat 


tcaggccatc 


gatcccgccg 


ccaccatgga 


^tggagctgg 


gtctttctct 


3540 




tcttcctgtc 


agtaactaca 


ggtgtccact 


ccgacatcca 


gatgacccag 


tctccagcct 


3600 


60 


ccctatctgc 


atctgtggga 


gaaactgtca 


ctatcacatg 


tcgagcaagt 


gggaatattc 


3660 


acaattattt 


agcatggtat 


cagcagaaac 


agggaaaatc 


tcctcagctc 


ctggtctata 


3720 



atgcaaaaac 
aatattctct 
atttttggag 
atgctgcacc 
cctcagtcgt 
ttgatggcag 
acagcaccta 
acagctatac 
acaggaatga 
cctaacgtta 
ttttccacca 
ttgacgagca 
gtcgtgaagg 
ctttgcaggc 
gtataagata 
gtggaaagag 
aaggtacccc 
tagtcgaggt 
aaaacacgat 
tgccacccag 
ctcagtcaag 
ggtgaagcag 
gaatactgaa 
caacacagtc 
tgctagtgga 
tgcagccaaa 
taactccatg 
gacctggaac 
tgacctctac 
cgtcacctgc 
cagggattgt 



cttagcagat 
caagatcaac 
tactccgtgg 
aactgtatcc 
gtgcttcttg 
tgaacgacaa 
cagcatgagc 
ctgtgaggcc 
gtgttgaaag 
ctggccgaag 
tattgccgtc 
ttcctagggg 
aagcagttcc 
agcggaaccc 
cacctgcaaa 
tcaaatggct 
attgtatggg 
taaaaaaacg 
gataatatgg 
gccgaggttc 
ttgtcctgca 
aggcctgaac 
tatgacccga 
aacctgcagc 
ggggaactgg 
acgacacccc 
gtgaccctgg 
tctggatccc 
actctgagca 
aacgttgccc 
actagtggag 



ggtgtgccat 
agcctgcagc 
acgttcggtg 
atcttcccac 
aacaacttct 
aatggcgtcc 
agcaccctca 
actcacaaga 
catcgatttc 
ccgcttggaa 
ttttggcaat 
tctttcccct 
tctggaagct 
cccacctggc 
ggcggcacaa 
ctcctcaagc 
atctgatctg 
tctaggcccc 
cctcctttgt 
agcttcagca 
cagcttctgg 
agggcctgga 
agttccaggg 
tcagcagcct 
ggtttcctta 
catctgtcta 
gatgcctggt 
tgtccagcgg 
gctcagtgac 
acccggccag 
gtggaggtag 



caaggttcag 
ctgaagattt 
gaggcaccaa 
catccagtga 
accccaaaga 
tgaacagttg 
cattgaccaa 
catcaacttc 
ccctgaattc 
taaggccggt 
gtgagggccc 
ctcgccaaag 
tcttgaagac 
gacaggtgcc 
ccccagtgcc 
gtattcaaca 
gggcctcggt 
ccgaaccacg 
ctctctgctc 
gtctggggca 
cttcaacatt 
gtggattgga 
caaggccact 
gacatctgag 
ctggggccaa 
tccactggcc 
caagggctat 
tgtgcacacc 
tgtcccctcc 
cagcaccaag 
ccaccatcac 



tggcagtgga 
tgggagttat 
gctggaaatc 
gcagttaaca 
catcaatgtc 
gactgatcag 
ggacgagtat 
acccattgtc 
gcccctctcc 
gtgcgtttgt 
ggaaacctgg 
gaatgcaagg 
aaacaacgtc 
tctgcggcca 
acgttgtgag 
aggggctgaa 
gcacatgctt 
gggacgtggt 
ctggtaggca 
gagcttgtga 
aaagacacct 
aggattgatc 
ataacagcag 
gacactgccg 
gggactctgg 
cctggatctg 
ttccctgagc 
ttcccagctg 
agcacctggc 
gtggacaaga 
catcaccatt 



tcaggaacac 
tactgtcaac 
aaacgggctg 
tctggaggtg 
aagtggaaga 
gacagcaaag 
gaacgacata 
aagagcttca 
ctcccccccc 
ctatatgtta 
ccctgtcttc 
tctgttgaat 
tgtagcgacc 
aaagccacgt 
ttggatagtt 
ggatgcccag 
tacatgtgtt 
tttcctttga 
tcctattcca 
agccaggggc 
ttatgcactg 
ctgcgaatgg 
acacatcctc 
tctattactg 
tcactgtctc 
ctgcccaaac 
cagtgacagt 
tcctgcagtc 
ccagcgagac 
aaattgtgcc 
aatctagagt 
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taagcggccg tcgagatcta ggcctcctag gtcgacatcg ataaaataaa agattttatt 
tagtctccag aaaaaggggg gaatgaaaga ccccacctgt aggtttggca agctagctta 
agtaacgcca ttttgcaagg catggaaaaa tacataactg agaatagaga agttcagatc 
aaggtcagga acagatggaa cagctgaata tgggccaaac aggatatctg tggtaagcag 
ttcctgcccc ggctcagggc caagaacaga tggaacagct gaatatgggc caaacaggat 
atctgtggta agcagttcct gccccggctc agggccaaga acagatggtc cccagatgcg 
gtccagccct cagcagtttc tagagaacca tcagatgttt ccagggtgcc ccaaggacct 
gaaatgaccc tgtgccttat ttgaactaac caatcagttc gcttctcgct tctgttcgcg 
cgcttctgct ccccgagctc aataaaagag cccacaaccc ctcactcggg gcgccagtcc 
tccgattgac tgagtcgccc gggtacccgt gtatccaata aaccctcttg cagttgcatc 
cgacttgtgg tctcgctgtt ccttgggagg gtctcctctg agtgattgac tacccgtcag 
cgggggtctt tcatt 
<210> 14 
<211> 43 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 
<400> 14 

ctttgaaaaa cacgatgata atatggcctc ctttgtctct ctg 
<210> 15 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 
<400> 15 

ttcgcgagct cgagatctag atatcccatg 
<210> 16 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
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<223> Synthetic 

<400> 16 

ctacaggtgt ccacgtcgac atccagctga cccag 

<210> 17 

<211> 34 

<212> DNA 

<213> Artificial Sequence 
<22q> 

<223> Synthetic 

<400> 17 

ctgcagaata gatctctaac actctcccct gttg 

<210> 18 

<211> 51 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 18 

cagtgtgatc tcgagaattc aggacctcac catgggatgg agctgtatca 

<210> 19 

<211> 23 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 19 

aggctgtatt ggtggattcg tct 

<210> 20 

<211> 41 

<212> DNA 

<213> Artificial Sequence 
<220> 



109 



<223> Synthetic 

<400> 20 

agcttctcga gttaacagat ctaggcctcc 

<210> 21 

<211> 39 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 21 

cgatgtcgac ctaggaggcc tagatctgtt 

<210> 22 

<211> 64 

<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 22 

cgaggctctg cacaaccact acacgcagaa 
gccg 

<210> 23 

<211> 72 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 23 

aattcggctt tcatttcccg ggagacaggg 
gagcctcgtg ca 

<210> 24 

<211> 41 

<212> DNA 

<213> Artificial Sequence 



taggtcgaca t 



aactcgaga 



gagcctctcc ctgtctcccg ggaaatgaaa 



agaggctctt ctgcgtgtag tggttgtgca 
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<220> 

<223> Synthetic 

<400> 24 

aaagcatatg ttctgggcct tgttacatgg ctggattggt t 

<210> 25 

<211> 54 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 25 

tgaattcggc gcccccaaga acctgaaatg gaagcatcac tcagtttcat 

<210> 26 

<211> 35 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 26 

ctacaggtgt ccacgtcgac atccagctga cccag 

<210> 27 

<211> 34 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<400> 27 

ctgcagaata gatctctaac actctcccct gttg 

<210> 28 

<211> 51 

<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> 


Synthetic 


5 


<400> 28 

cagtgtgatc tcgagaattc aggacctcac 


<210> 


29 




<211> 


22 


1 u 


<212> 


DNA 




<213> 


Artificial Sequence 


15 


<220> 
<223> 


Synthetic 


20 


<400> 29 

gtgtcttcgg gtctcaggct gt 


<210> 


30 


I : 

i.n 


<211> 


41 




<212> 


DNA 




<213> 


Artificial Sequence 


u 
iji 


<220> 
<223> 


Synthetic 




<400> 30 

agcttctcga gttaacagat ctaggcctcc 
<210> 31 


^0 


<211> 


39 




<212> 


DNA 




<213> 


Artificial Sequence 


45 


<220> 
<223> 


Synthetic 




<400> 31 

cgatgtcgac ctaggaggcc tagatctgtt 




<210> 


32 


55 


<211> 
<212> 


64 
DNA 




<213> 


Artificial Sequence 


60 


<220> 






<223> 


Synthetic 



51 



22 



41 



39 
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113 



10 



15 



20 



3ae 
□ 



q 



45 



50 



55 



60 



tgtctatgtt 


tgatgttatg 


cgcctgcgtc 


tgtactagtt 


agctaactag 


ctctgtatzct 


you 


ggcggacccg 


tggtggaact 


gacgagttct 


gaacacccgg 


ccgcaaccct 


gggagacgtc 


a c n 
950 


ccagggactt 


tgggggccgt 


ttttgtggcc 


cgacctgagg 


3^^999^9tcg 


atgtggaatc 


T n o n 
102 0 


cgaccccgtc 


aggatatgtg 


gttctggtag 


gagacgagaa 


cctaaaacag 


ttcccgcctc 


1 A O A 

lOoO 


cgtctgaatt 


tttgctttcg 


gtttggaacc 


gaagccgcgc 


gtcttgtctg 


ctgcagcgct 


T T y1 A 
1140 


gcagcatcgt 


tctgtgttgt 


ctctgtctga 


ctgtgtttct 


gtatttgtct 


gaaaattagg 


T O A A 

IzOO 


gccagactgt 


taccactccc 


ttaagtttga 


ccttaggtca 


ctggaaagat 


gtcgagcgga 


1260 


tcgctcacaa 


ccagtcggta 


gatgtcaaga 


agagacgttg 


ggttaccttc 


tgctctgcag 


1320 


aatggccaac 


ctttaacgtc 


ggatggccgc 


gagacggcac 


ctttaaccga 


gacctcatca 


1380 


cccaggttaa 


gatcaaggtc 


ttttcacctg 


gcccgcatgg 


acacccagac 


caggtcccct 


1440 


acatcgtgac 


ctgggaagcc 


ttggcttttg 


acccccctcc 


ctgggtcaag 


ccctttgtac 


1500 


accctaagcc 


tccgcctcct 


cttcctccat 


ccgccccgtc 


tctccccctt 


gaacctcctc 


1560 


gttcgacccc 


gcctcgatcc 


tccctttatc 


cagccctcac 


tccttctcta 


ggcgccggaa 


1620 


ttccgatctg 


atcaagagac 


aggatgaggg 


agcttgtata 


tccattttcg 


gatctgatca 


T ^ O A 

1680 


gcacgtgttg 


acaattaatc 


atcggcatag 


tatatcggca 


tagtataata 


cgacaaggtg 


1740 


aggaactaaa 


ccatggccaa 


gcctttgtct 


caagaagaat 


ccaccctcat 


tgaaagagca 


180 0 


acggctacaa 


tcaacagcat 


ccccatctct 


gaagactaca 


gcgtcgccag 


cgcagctctc 


1860 


tctagcgacg 


gccgcatctt 


cactggtgtc 


aatgtatatc 


attttactgg 


gggaccttgt 


1920 


gcagaactcg 


tggtgctggg 


cactgctgct 


gctgcggcag 


ctggcaacct 


gacttgtatc 


1980 


gtcgcgatcg 


gaaatgagaa 


caggggcatc 


ttgagcccct 


gcggacggtg 


tcgacaggtg 


2040 


cttctcgatc 


tgcatcctgg 


gatcaaagcg 


atagtgaagg 


acagtgatgg 


acagccgacg 


2100 


gcagttggga 


ttcgtgaatt 


gctgccctct 


ggttatgtgt 


gggagggcta 


agcacttcgt 


2160 


ggccgaggag 


caggactgac 


acgtgctacg 


agatttcgat 


tccaccgccg 


ccttctatga 


2220 


aaggttgggc 


ttcggaatcg 


ttttccggga 


cgccggctgg 


atgatcctcc 


agcgcgggga 


2280 


tctcatgctg 


gagttcttcg 


cccaccccaa 


cttgtttatt 


gcagcttata 


atggttacaa 


2340 


ataaagcaat 


agcatcacaa 


atttcacaaa 


taaagcattt 


ttttcactgc 


attctagttg 


2400 


tggtttgtcc 


aaactcatca 


atgtatctta 


tcatgtctgt 


acgagttggt 


tcagctgctg 


2460 


cctgaggctg 


gacgacctcg 


cggagttcta 


ccggcagtgc 


aaatccgtcg 


gcatccagga 


2520 


aaccagcagc 


ggctatccgc 


gcatccatgc 


ccccgaactg 


caggagtggg 


gaggcacgat 


2580 


ggccgctttg 


gtcgaggcgg 


atccggccat 


tagccatatt 


attcattggt 


tatatagcat 


2640 


aaatcaatat 


tggctattgg 


ccattgcata 


cgttgtatcc 


atatcataat 


atgtacattt 


2700 
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atattggctc 


atgtccaaca 


ttaccgccat 


gttgacattg 


n 4- 4- n 4- 4- n n 4- 

attattgact 


ag u t: a 1 1 a a t 


o T A 




agtaatcaat 


tacggggtca 


ttagttcata 


gcccatatat 


ggagttccgc 


gttacataac 


O O A 


c 

J 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


O O O A 

z ooU 




tgacgtatgt 


tcccatagta 


acgccaatag 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


O OA A 


10 


atttacggta 


aactgcccac 


ttggcagtac 


atcaagtgta 


tcatatgcca 


agtacgcccc 


"3 A A A 


ctatcgacgc 


caatgacggt 


aaatggcccg 


cctggcatta 


tgcccagtac 


atgaccttat 


0 A C A 

J U b U 




gggactttcc 


tacttggcag 


tacatctacg 


tattagtcat 


cgctattacc 


atggtgatgc 






ggttttggca 


gtacatcaat 


gggcgtggat 


agcggtttga 


ctcacgggga 


tttccaagtc 


O T O A 




tccaccccat 


tgacgtcaat 


gggagtttgt 


tttggcacca 


aaatcaacgg 


gactttccaa 


O O /I A 


20 


aatgtcgtaa 


caactccgcc 


ccattgacgc 


aaatgggcgg 


taggcatgta 


cggtgggagg 


1 "3 A A 
O O U U 


tctatataag 


cagagctcgt 


ttagtgaacc 


gtcagatcgc 


ctggagacgc 


catccacgct 


"3 T £r A 


Q 


gttttgacct 


ccatagaaga 


caccgggacc 


gatccagcct 


ccgcggcccc 


aagcttctcg 


3420 




agttaacaga 


tctaggctgg 


cacgacaggt 


ttcccgactg 


gaaagcgggc 


agtgagcgca 


"J /I Q A 


iS 
SI 


acgcaattaa 


tgtgagttag 


j-^ ^ i-t —t. n<^i^4- 

ctcactcatt 


aggcacccca 


ggctttacac 


4-4-4- -^4-j-<ry-i4-^n 

utcaugctLC 


"3 C ji A 




cggctcgtat 


gttgtgtgga 


attgtgagcg 


gataacaatt 


tcacacagga 


aacagctatg 


3600 


01 
















accatgatta 


cgccaagctt 


ggctgcaggt 


cgacggatcc 


actagtaacg 


gccgccagtg 


3660 




tgctggaatt 


caccatgggg 


caacccggga 


acggcagcgc 


cttcttgctg 


gcacccaatg 


372 0 


3^ 


gaagccatgc 


gccggaccac 


gacgtcacgc 


agcaaaggga 


cgaggtgtgg 


gtggtgggca 


3780 




tgggcatcgt 


catgtctctc 


atcgtcctgg 


ccatcgtgtt 


tggcaatgtg 


ctggtcatca 


3840 




cagccattgc 


caagttcgag 


cgtctgcaga 


cggtcaccaa 


ctacttcatc 


acaagcttgg 


3900 




cctgtgctga 


tctggtcatg 


gggctagcag 


tggtgccctt 


tggggccgcc 


catattctca 


3960 




tgaaaatgtg 


gacttttggc 


aacttctggt 


gcgagttctg 


gacttccatt 


gatgtgctgt 


402 0 




gcgtcacggc 


atcgattgag 


accctgtgcg 


tgatcgcagt 


cgaccgctac 


tttgccatta 


/I A O A 

4080 




M 4- M M 4- 4- 

ctagcccttt: 


caagtaccag 


agcctgctga 


ccaagaataa 


ggcccgggtg 


— i4-Mn4-4-M4- A 

atcattcuga 


/I T /I A 

4 14 U 


50 


4- <-v.h4- Mi 4- jtrj^ 4- 

tggtgtggat 


tgtgtcaggc 


cttacctcct 


tcttgcccat 


tcagatgcac 


tggtacaggg 


^ O A A 


ccacccacca 


ggaagccatc 


aactgctatg 


ccaatgagac 


ctgctgtgac 


ttcttcacga 


>1 O ^ A 

426 0 




accaagccta 


tgccattgcc 


tcttccatcg 


tgtccttcta 


cgttcccctg 


gtgatcatgg 


4320 


55 


tcttcgtcta 


ctccagggtc 


tttcaggagg 


ccaaaaggca 


gctccagaag 


attgacaaat 


4380 




ctgagggccg 


cttccatgtc 


cagaacctta 


gccaggtgga 


gcaggatggg 


cggacggggc 


4440 


60 


atggactccg 


cagatcttcc 


aagttctgct 


tgaaggagca 


caaagccctc 


aagacgttag 


4500 


gcatcatcat 


gggcactttc 


accctctgct 


ggctgccctt 


cttcatcgtt 


aacattgtgc 


4560 
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atgtgatcca 




atgtcaattc 


c 

5 


tccaggagct 




gcaacggcaa 


10 


tgctgtgtga 


gcgataacat 




tcgaggggcg 


15 


acaagtaagc 




y y ^ y y ci ct 


20 


ctggccgaag 




tattgccgtc 




ttcctagggg 








aagcagttcc 




agcggaaccc 


3© 


cacctgcaaa 




tcaaatggct 


o 


attgtatggg 




taaaaaaacg 


lU 


gataatatgg 




gccgagctca 








aactgcaagt 




tatcagcaga 


4j 


tccggggtcc 




agcagcctgc 


50 


acgttcggcc 


atcttcccgc 




aataacttct 


55 


ggtaactccc 




agcaccctga 


60 


acccatcagg 


ctagataatt 



ggataacctc atccgtaagg 
tggtttcaat ccccttatct 
tctgtgcctg cgcaggtctt 
cacaggggag cagagtggat 
agacctccca ggcacggaag 
tgattcacaa gggaggaatt 
gcaccaccat catcaccacg 
tttatccatc acactggcgg 
ggccggatcc ccgggaattc 
ccgcttggaa taaggccggt 
ttttggcaat gtgagggccc 
tctttcccct ctcgccaaag 
tctggaagct tcttgaagac 
cccacctggc gacaggtgcc 
ggcggcacaa ccccagtgcc 
ctcctcaagc gtattcaaca 
atctgatctg gggcctcggt 
tctaggcccc ccgaaccacg 
cctcctttgt ctctctgctc 
cccagtctcc agactccctg 
ccagccagag tgttttgtac 
aaccaggaca gcctcctaag 
ctgaccgatt cagtggcagc 
^ggctgaaga tgtggcagtt 
aagggaccaa ggtggaaatc 
catctgatga gcagttgaaa 
atcccagaga ggccaaagta 
aggagagtgt cacagagcag 
cgctgagcaa agcagactac 
gcctgagatc gcccgtcaca 
aattaggagg agatctcgag 



aagtttacat cctcctaaat 
actgccggag cccagatttc 
ctttgaaggc ctatggcaat 
atcacgtgga acaggagaaa 
actttgtggg ccatcaaggt 
gtagtacaaa tgactcactg 
tcgaccccgg ggactacaag 
ccgctcgagc atgcatctag 
gcccctctcc ctcccccccc 
gtgcgtttgt ctatatgtta 
ggaaacctgg ccctgtcttc 
gaatgcaagg tctgttgaat 
aaacaacgtc tgtagcgacc 
tctgcggcca aaagccacgt 
acgttgtgag ttggatagtt 
aggggctgaa ggatgcccag 
gcacatgctt tacatgtgtt 
gggacgtggt tttcctttga 
ctggtaggca tcctattcca 
gctgtgtctc tgggcgagag 
agctccaaca ataagaacta 
ctgctcattt actgggcatc 
gggtctggga cagatttcac 
tattactgtc agcaatatta 
aaacgaactg tggctgcacc 
tctggaactg cctctgttgt 
cagtggaagg tggataacgc 
gacagcaagg acagcaccta 
gagaaacaca aactctacgc 
aagagcttca acaaggggag 
ctcgcgaaag cttggcactg 



tggataggct 


4620 


aggattgcct 


4680 


ggctactcca 


4740 


gaaaataaac 


4800 


actgtgccta 


4860 


ctctcgagaa 


4 92 0 


gatgacgatg 


4980 


cggccgctcg 


5040 


cctaacgtta 


5100 


ttttccacca 


5160 


ttgacgagca 


5220 


gtcgtgaagg 


5280 


ctttgcaggc 


5340 


gtataagata 


5400 


gtggaaagag 


5460 


aaggtacccc 


5520 


tagtcgaggt 


5580 


aaaacacgat 


5640 


tgccacccag 


5700 


ggccaccatc 


5760 


tttagcttgg 


5820 


tacccgggaa 


5880 


tctcaccatc 


5940 


tagtactcag 


6000 


atctgtcttc 


6060 


gtgcctgctg 


6120 


cctccaatcg 


6180 


cagcctcagc 


6240 


ctgcgaagtc 


6300 


agtgttagtt 


6360 


gccgtcgttt 


6420 
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tacaacgtcg 


tgactgggaa 


aaccctggcg 


ttacccaact 


taatcgcctt 


gcagcacatc 


6480 




cccctttcgc 


cagcctccta 


ggtcgacatc 


gataaaataa 


aagattttat 


ttagtctcca 


^ C /I A 


c 

J 


gaaaaagggg 


ggaatgaaag 


accccacctg 


taggtttggc 


aagctagctt 


aagtaacgcc 


DoUU 




attttgcaag 


gcatggaaaa 


atacataact 


gagaatagag 


aagttcagat 


caaggtcagg 




10 


aacagatgga 


acagctgaat 


atgggccaaa 


caggatatct 


gtggtaagca 


gttcctgccc 


672 0 


cggctcaggg 


ccaagaacag 


atggaacagc 


tgaatatggg 


ccaaacagga 


tatctgtggt 


6780 




aagcagttcc 


tgccccggct 


cagggccaag 


aacagatggt 


ccccagatgc 


ggtccagccc 


6840 


1 c 


tcagcagttt 


ctagagaacc 


atcagatgtt 


tccagggtgc 


cccaaggacc 


tgaaatgacc 


6900 




ctgtgcctta 


tttgaactaa 


ccaatcagtt 


cgcttctcgc 


ttctgttcgc 


gcgcttctgc 


/T o /r r\ 

6960 


20 


tccccgagct 


caataaaaga 


gcccacaacc 


cctcactcgg 


ggcgccagtc 


ctccgattga 


702 0 


ctgagtcgcc 


cgggtacccg 


tgtatccaat 


aaaccctctt 


gcagttgcat 


ccgacttgtg 


7080 




Qtctcqctqt 


tccttcrqqaq 

www w ^^^^^^ 


ggtctcctct 


gagtgattga 


ctacccgtca 


qcgggggtct 


7140 




ttcatttggg 


ggctcgtccg 


ggatcgggag 


acccctgccc 


agggaccacc 


gacccaccac 


7200 




cgggaggtaa 


gctggctgcc 


tcgcgcgttt 


cggtgatgac 


ggtgaaaacc 


tctgacacat 


7260 


O 
3© 


gcagctcccg 


gagacggtca 


cagcttgtct 


gtaagcggat 


gccgggagca 


gacaagcccg 


7320 


U s 


tcagggcgcg 


tcagcgggtg 


ttggcgggtg 


tcggggcgca 


gccatgaccc 


agtcacgtag 


73 80 


3 


cgatagcgga 


gtgtatactg 


gcttaactat 


qcggcatcag 


agcagattgt 


actgagagtg 


7440 




caccatatgc 


ggtgtgaaat 


accgcacaga 


tgcgtaagga 


gaaaataccg 


catcaggcgc 


7500 




tcttccgctt 


cctcgctcac 


tgactcgctg 


cgctcggtcg 


ttcggctgcg 


gcgagcggta 


7560 


o 


tcagctcact 


caaaggcggt 


aatacggtta 


tccacagaat 


caggggataa 


cgcaggaaag 


7620 




aacatgtgag 


caaaaggcca 


gcaaaaggcc 


aggaaccgta 


aaaaggccgc 


gttgctggcg 


7680 




tttttccata 


ggctccgccc 


ccctgacgag 


catcacaaaa 


atcgacgctc 


aagtcagagg 


7740 




tggcgaaacc 


cgacaggact 


ataaagatac 


caggcgtttc 


cccctggaag 


ctccctcgtg 


7800 




cgctctcctg 


ttccgaccct 


gccgcttacc 


ggatacctgt 


ccgcctttct 


cccttcggga 


7860 


50 


agcgtggcgc 


tttctcatag 


ctcacgctgt 


aggtatctca 


gttcggtgta 


ggtcgttcgc 


7920 


tccaagctgg 


gctgtgtgca 


cgaacccccc 


gttcagcccg 


accgctgcgc 


cttatccggt 


7980 




aactatcgtc 


ttgagtccaa 


cccggtaaga 


cacgacttat 


cgccactggc 


agcagccact 


8040 


55 


ggtaacagga 


ttagcagagc 


gaggtatgta 


99cggtgcta 


cagagttctt 


gaagtggtgg 


8100 




cctaactacg 


gctacactag 


aaggacagta 


tttggtatct 


gcgctctgct 


gaagccagtt 


8160 


60 


accttcggaa 


aaagagttgg 


tagctcttga 


tccggcaaac 


aaaccaccgc 


tggtagcggt 


8220 


ggtttttttg 


tttgcaagca 


gcagattacg 


cgcagaaaaa 


aaggatctca 


agaagatcct 


8280 
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ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 834 0 

gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 8400 

aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 8460 

gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 8520 

gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 8580 

cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 8640 

gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 8700 

15 gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctgca 8760 

ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 8820 

tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 8880 

20 

ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 8940 

'zQ cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 90 0 0 

2^ accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaaca 9060 

^ cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 9120 

n tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 9180 

^ cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 9240 

O acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 9300 

3$,i atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 9360 

•D tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga 942 0 

P 

aaagtgccac ctgacgtcta agaaaccatt attatcatga cattaaccta taaaaatagg 9480 

40 

cgtatcacga ggccctttcg tcttcaagaa t 9511 
<210> 35 
45 <211> 30 
<212> DNA 

<213> Artificial Sequence 

50 

<220> 

<223> Synthetic 
55 <400> 35 

gatccactag taacggccgc cagaattcgc 3 0 
<210> 36 
60 <211> 43 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> Synthetic 
<400> 36 

cagagagaca aaggaggcca tattatcatc gtgtttttca aag 43 



10 
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